References marked with "*" mean the section named has an initial section and subsections, but the reference here is only to the initial section. Otherwise a reference means all subsections as well. If I code an entire function, I annotate it "WRITE" (I may change it later). If I only code part of it, I annotate it "START", and then when I return to it, "MORE", or "FINISH" for when it's conceptually done (again, I may change it later). (If I stop writing a function to write a little leaf function it calls, I don't bother indicating this as a START/FINISH, I just log them separately.) ------------ START SESSION -------------- ==== 2:11 AM Monday, April 02, 2007 Instant Messenger conversation with Casey (the IM log is provided elsewhere) for gathering intermediate data that's clearly defined by the specification to use to help validate individual subsystems of the new implementation. (Continues overlappign with below.) == 2:30 AM Monday, April 02, 2007 WRITE start_page() ogg page header parser reference: ogg spec: "page header" WRITE capture_pattern() parse just the first 4 header bytes reference: ogg spec: "page header" WRITE error() error-handling convenience function WRITE vorbis_validate() check a vorbis-packet header reference: vorbis spec--4.2.1 WRITE get8() various file/memory reading functions WRITE get32() (endian-independent) WRITE getn() START start_decoder() vorbis file header parser--packet 1 reference: vorbis spec--4.2.2 notes: discard granule position for now notes: discard sequence number, serial number for now WRITE vorbis_alloc() allocate and zero memory for the main structure WRITE vorbis_free() free structure and all pointers inside it WRITE vorbis_file() WRITE vorbis_filename() WRITE vorbis_memory() allocate a vorbis structure and call start_decoder(), handle errors WRITE start_packet() WRITE next_segment() reference: ogg spec--"packet segmentation" WRITE skip() another file-reading function MORE start_decoder() packet 2: skip the "comment" packet reference: vorbis spec--4.2.3 notes: does RAD want it? == 4:18 AM Monday, April 02, 2007 WRITE get8_packet() WRITE get_bits() arbitrary-bit-length integer reader from packet reference: vorbis spec--2.1.4 WRITE get_bits_signed() signed-version of get_bits() WRITE start_decoder() packet 3: setup packet reference: vorbis spec--4.2.4 reference: vorbis spec--3.2.1*, 3.2.2.2 reference: vorbis spec--6.2.1 reference: vorbis spec--7.2.2* reference: vorbis spec--8.6.1 SPECIFICATION ISSUE: What do do with mapping->mux[] if submaps <= 1? notes: come back to it when we have code using it WRITE ilog() integer log, modified from stb.h (PD) notes: have to add one to all the table entries, and handle signed values WRITE float32_unpack() unpack a 32-bit float reference: vorbis spec--9.2.2 notes: is this just a little-endian IEEE float? maybe? try it. WRITE lookup1_values() determine # of values for a codebook-lookup-mode-1 reference: vorbis spec--9.2.3 WRITE compute_codewords() compute the actual huffman encodings reference: vorbis spec--3.2.1.1 SPECIFICATION ISSUE: specification is slightly ambiguous: lengths can be out of order, but in all the examples the huffman numbering and the entries are in the same order. If this is an actual limitation, there's a much simpler implementation. notes: try the easy implementation WRITE bit_reverse() 32-bit bit reversal from stb.h (PD) Grab DirectSound etc. code from my music composing software, and use that to write the sample app (so eventually we can make it play the audio data). Call vorbis_file() with it. Switch to using vorbis_memory() so I can look at the entire stream while debugging. BUGFIX start_decoder() a few typos/brainos etc. WRITE flush_packet() skip to end of current packet WRITE get8_packet_raw() split out from get_packet() to implement flush_packet() ==== 7:39 AM Monday, April 02, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 4:13 PM Monday, April 02, 2007 WRITE start_page_no_capturepattern() split out from start_page(), allow seeking START vorbis_decode_packet() read and decode one packet/frame reference: vorbis spec-4.3*, 4.3.1, 4.3.2 notes: decode window into explicit array for now reference: vorbis spec-7.2.2.1 (skip Floor type 0 for now, since it's not used currently) WRITE codebook_decode_scalar() brute force huffman decoding WRITE prep_huffman() prepare the bit-buffer for huffman decode WRITE neighbors() compute adjacent points to interpolate from reference: vorbis spec-9.2.4*, 9.2.4.1 WRITE predict_point() called "render_point" in spec reference: vorbis spec-9.2.4.2 MORE vorbis_decode_packet() more floor-1 code notes: probably a better way to do this than calling qsort().. the x-coordinates are fixed so can "pre-sort"? WRITE point_compare() compare points for qsort() above WRITE draw_line() called "render_line" in spec reference: vorbis spec-9.2.4.3 cut and paste "floor1 inverse dB static table" from spec. MORE vorbis_decode_packet() residue decode reference: vorbis spec-4.3.3, 4.3.4 SPECIFICATION ISSUE: to answer earlier question, it looks like looks like mapping->mux should be all 0 if submaps<=1 MORE start_decoder() notes: add more per-channel buffers to store the residues into WRITE decode_residue() residue decoding reference: vorbis spec-8.6.2 ==== 8:00 PM Monday, April 02, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 11:22 PM Monday, April 02, 2007 WRITE residue_decode() residue decoding low-level reference: vorbis spec-8.6.3, 8.6.4 WRITE codebook_decode() decode from codebook in vector context reference: vorbis spec-3.2.1.2.1, 3.2.1.2.2 WRITE inverse_mdct() inverse MDCT reference: Sporer et al page 2 notes: naive O(N^2) reference: M_PI from CRC Standard Mathematical Tables & Formulas FINISH vorbis_decode_packet() notes: skip windowing for now WRITE vorbis_get_frame() simple streaming API notes: ignore window overlapping, just return internal buffers Test jig: call vorbis_get_frame() until it returns a 0 length. Convert float buffers to 16-bit integer. SPECIFICATION ISSUE: (what's the scaling factor? Probably 32767.) DEBUGGING REWRITE compute_codewords() notes: ok, the huffman stuff is totally broken. probably wrong assumption reference: vorbis spec--3.2.1.1 notes: hey, turns out there's a simple way to do "unsorted" case too BUGFIX compute_codewords() notes: clear available[z] after allocating it notes: (1 << (32-y)), not (1 << (32-i)) BUGFIX neighbors() notes: what, no, you can't just find the smallest and largest elements notes: wtf was I on when I wrote this? reference: vorbis spec-9.2.4*, 9.2.4.1 BUGFIX float32_unpack() unpack a 32-bit float notes: no, this _isn't_ just a little-endian IEEE float. code it. reference: vorbis spec--9.2.2 Ok, successfully output a waveform that resembles the shark fin in test1.ogg (plus other wonkiness due to non-windowing presumably), but it's scaled wrong. ==== 4:22 AM Tuesday, April 03, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 8:13 PM Tuesday, April 03, 2007 BUGFIX: range_list must be indexed by multiplier-1, not multiplier notes: caught this with the dump of the internal state BUGFIX: IMDCT output is scaled incorrectly reference: wikipedia MDCT notes: Sporer et al calls for scaling the IMDCT by the blocksize; this is echoed by wikipedia. Neither formula (they look mathematically equivalent) produces correct results. It looks like different blocks are being scaled differently wrong, though. (Internal dump very helpful here, since I can see the input to the IMDCT is correct.) notes: What eventually worked for me was NOT scaling the IMDCT at all; probably the vorbis encoder pre-multiplies the data to "speed up" the decoder? SPECIFICATION ISSUE: this should be documented in the spec FIX: vorbis_decode_packet() apply the window FIX: vorbis_get_frame() compute and apply the overlapping; discard first frame Ok, successfully decoded test1.ogg to a buffer. Scaling looks basically right. BUGFIX: residue type 2 is totally screwed up. rewrite Hooray! Successfully decoded test3.ogg to a file. At least it sounds right. TESTING: compare output to libvorbis notes: libvorbis output computed using oggdrop notes: compute delta by inverting and mixing in soundforge notes: garbage at the start of oggdrop output; have to delete and tweak to line up notes: nearly perfect; every sample is within +-2... (kind of weird) == 11:57 PM Tuesday, April 03, 2007 BUGFIX: change float-to-int scale to 32768 (from 32767) notes: now soundforge shows +-1 notes: can still hear the music in the +-1 notes: but maybe that's normal? notes: is this just quantization noise? or maybe rounding? BUGFIX: replace float-to-int (trunc towards 0?) with fast-float-to-int + round notes: use old-school store in double, read mantissa trick notes: now +- 1 but mostly 0, and just static, can't hear the music in it WRITE inverse_mdct_fast(), NlogN reference: Sporer et al, page 2-3 DEBUGGING notes: inverse_mdct_fast totally fails notes: because of the weird sparseness in arrays, there's probably problems there BUGFIX: some array entries aren't set notes: make parallel structures indicating which entries are valid notes: the "bitreverse" step wasn't copying some entries (would be fine if it were in-place) BUGFIX: some array entries set without being used notes: the "butterfly" step (I think?) is overwriting instead of swapping notes: probably just left out a swapping step in the notation BUGFIX: some sign errors in what I typed notes: how to debug a fast IMDCT (when you have no idea what the correct intermediate values are): set input vector to all 0s and one 1. if the output matches known good IMDCT, move the 1 to a different element. If output does not match a known good IMDCT, single step through the algorithm and look at each place where the value is used (and keep track of it), and double-check those formulae seem sane (many values will be 0 at first, so easy to think through) Fast IMDCT now works! Freeze this version as "oggvorbis_clean.c", since it's now fast enough to be useable. (2x or 3x too slow, but at least real time.) record page sequence number so we can be check it MODIFY start_decoder() pump the first frame of audio data by calling vorbis_get_frame() ==== 05:33 AM Wednesday, April 04, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 11:57 AM Wednesday, April 04, 2007 OPTIMIZE: O(1) huffman decode for smaller tokens notes: basically the same as zlib & jpeg OPTIMIZE: binary search huffman decode notes: similar to zlib, need to bit-reverse because the bitpacking is at the wrong end from optimal notes: don't include the huffman symbols for the above "smaller tokens" for a tiny win OPTIMIZE: precompute IMDCT twiddle factors during setup OPTIMIZE: precompute IMDCT windows during setup notes: only multiply the parts of the windows that are non-0, non-1 At this point it's sane (not doing _hugely_ stupid things). OPTIMIZE: make fast-huffman decode a macro, move it up into all callers notes: big win! Cool. stb_vorbis is now as fast as libvorbis (or oggdrop, anyway)! Oh wait, it turns out that my performance testing was causing the convert-to-int code to be discarded, and I'm not actually as fast as libvorbis if I do the convert-to-int. Getting there, though. ==== 2:17 PM Wednesday, April 04, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 5:44 PM Wednesday, April 04, 2007 Ok, wait, I seem to have regressed--I'm not +-1 relative to libvorbis anymore... BUGFIX: initialize step2_flag[0] and step2_flag[1] notes: not actually a regression--also happens in the file I saved away before notes: I spent like 6 hours tracking this down, because it worked fine in debug (due to dumb luck) but not in opt/release, but due to the way I was set up (only actually saving the decoded file in release) I _didn't know_ it was opt-only, so I was madly comparing internal state in debug builds to the logged expected state, and it all looked right! Because it WAS right--in the debug builds where I could see it. Grr. ==== 11:21 PM Wednesday, April 04, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 2:34 AM Thursday, April 05, 2007 OPTIMIZE: make codebook VQ stop decoding to a temporary buffer notes: I was thinking I would have to inline the codebook in each place it's called, but it turns out every place works identically: it _adds_ the entries to another array! so optimizing this is trivial! notes: ok, not quite so trivial because one of them steps weirdly, and I still probably have to inline it for type=2 to avoid the extra pass for deinterleaving, which is the only case that's actually running OPTIMIZE: defer final floor computation until _after_ residue reference: vorbis spec-1.3.2.7 notes: the data fed into the IMDCT is: floor * res To avoid an extra temp buffer, we need to compute floor *= decoupled(res), or res *= floor. We can't do floor *= res, since res is accumulated and decoupled; deferring floor decode lets us do res *= floor. Need temp buffers to store the semi-decoded floor1, but they're tiny. notes: can finally switch residue to decoding to main buffers instead of their own buffers, get rid of a ton of buffers OPTIMIZE: codebook VQ for residue type 2 notes: still trying to strip out temporary memory usage notes: kind of messy, having to 'if (++ch == max)' notes: push down one level of looping into this version of the codebook decoder to reduce overhead (cleaner than inlining it) Appears to be faster than oggdrop if I _do_ convert to int, but I _don't_ write it to a file. Still want to reduce temporary storage further so not stopping here. OPTIMIZE: move window multiply to the code that mixes previous window notes: saves an unnecessary pass over the data OPTIMIZE: split cases for sequence_p notes: it looks like sequence_p is really geared to floor0, so we could strip out all the code for it if we're sure it's not needed OPTIMIZE: speed up IMDCT notes: remove all but one temp buffer... note, is it 2x what we need? i.e. are these buffers being used sparsely and we could pack them better? Faster than oggdrop WITH converting to int AND writing it to a file! Yay. ==== 5:09 AM Thursday, April 05, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 7:48 AM Thursday, April 05 2007 WRITE make_block_array(), temp_block_array(), temp_alloc() notes: functions for allocating temporary data, instead of huge stack things CHANGE start_decode(), vorbis_free()--malloc() several of the subarrays that are otherwise huge Examine the memory usage and realize that the soundforge-output test ogg file has a HUGE sparse codebook... I need to use a sparse representation for that. Fast huffman + binary search is TOTALLY the necessary sparse representation, too, except for how it maps to the codebook, so it should be easy. Except, man, the setup is a pain, and I'm sleepy. ==== 10:51 AM Thursday, April 05, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 6:33 PM Thursday, April 05, 2007 Work on sparse encoding. Write giant document vorbis_codebook.txt to make sure I know what I'm doing and to have something to refer to while coding. ==== 8:40 PM Thursday, April 05, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 6:53 AM Friday, April 06, 2007 Work on sparse encoding. Ok, got everything but the sparse codebook working ==== 7:42 AM Friday, April 06, 2007 Completed the sparse encoding of codebooks. Work on push-data-into-library API. ==== 10:54 AM Friday, April 06, 2007 Finished coding, time to debug. Ok, pushing data streams is working, still need to do seek recovery. ==== 12:31 PM Friday, April 06, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 2:55 PM Friday, April 06, 2007 WRITE: vorbis_search_for_page_pushdata() seek/flush recovery for pushdata mode notes: crc implementation altered from stb.h according to document referenced by ogg spec: http://www.ross.net/crc/download/crc_v3.txt (and some trial & error) ==== 4:42 PM Friday, April 06, 2007 ------------- END SESSION --------------- ------------ START SESSION -------------- ==== 11:55 PM Friday, April 06, 2007 Keep track of the current sample location, and update with the ogg granulepos; then truncate the final frame based on it. notes: specification is ambiguous; required trial-and-error notes: (although this was made worse by my "cleverly" always returning as much data as is ready, instead of just to the center of the frame) ==== 2:33 AM Saturday, April 07, 2007 ------------ START SESSION -------------- TODO OPTIMIZE: get rid of the qsort() call in the floor function notes: precompute neighbors, precompute sorted iteration order ================================================================================== ================================================================================== ------------- END SESSION --------------- ------------ START SESSION --------------