References marked with "*" mean the section named has an
initial section and subsections, but the reference here is
only to the initial section. Otherwise a reference means
all subsections as well.

If I code an entire function, I annotate it "WRITE" (I may
change it later). If I only code part of it, I annotate it
"START", and then when I return to it, "MORE", or "FINISH"
for when it's conceptually done (again, I may change it
later).  (If I stop writing a function to write a little
leaf function it calls, I don't bother indicating this as
a START/FINISH, I just log them separately.)


------------ START SESSION --------------

==== 2:11 AM Monday, April 02, 2007

Instant Messenger conversation with Casey (the IM log is provided
elsewhere) for gathering intermediate data that's clearly defined
by the specification to use to help validate individual subsystems
of the new implementation. (Continues overlappign with below.)

== 2:30 AM Monday, April 02, 2007

WRITE start_page() ogg page header parser
  reference: ogg spec: "page header"

WRITE capture_pattern() parse just the first 4 header bytes
  reference: ogg spec: "page header"

WRITE error() error-handling convenience function

WRITE vorbis_validate() check a vorbis-packet header
  reference: vorbis spec--4.2.1

WRITE get8()   various file/memory reading functions
WRITE get32()  (endian-independent)
WRITE getn()
        
START start_decoder() vorbis file header parser--packet 1
  reference: vorbis spec--4.2.2
  notes: discard granule position for now
  notes: discard sequence number, serial number for now

WRITE vorbis_alloc()   allocate and zero memory for the main structure
WRITE vorbis_free()    free structure and all pointers inside it

WRITE vorbis_file()
WRITE vorbis_filename()
WRITE vorbis_memory()
  allocate a vorbis structure and call start_decoder(), handle errors

WRITE start_packet()
WRITE next_segment()
  reference: ogg spec--"packet segmentation"

WRITE skip()  another file-reading function
  
MORE start_decoder() packet 2: skip the "comment" packet
  reference: vorbis spec--4.2.3
  notes: does RAD want it?

== 4:18 AM Monday, April 02, 2007

WRITE get8_packet()

WRITE get_bits() arbitrary-bit-length integer reader from packet
  reference: vorbis spec--2.1.4

WRITE get_bits_signed() signed-version of get_bits()

WRITE start_decoder() packet 3: setup packet
  reference: vorbis spec--4.2.4
  reference: vorbis spec--3.2.1*, 3.2.2.2
  reference: vorbis spec--6.2.1
  reference: vorbis spec--7.2.2*
  reference: vorbis spec--8.6.1
  SPECIFICATION ISSUE: What do do with mapping->mux[] if submaps <= 1?
  notes: come back to it when we have code using it

WRITE ilog() integer log, modified from stb.h (PD)
  notes: have to add one to all the table entries, and handle signed values

WRITE float32_unpack() unpack a 32-bit float
  reference: vorbis spec--9.2.2
  notes: is this just a little-endian IEEE float? maybe? try it.
        
WRITE lookup1_values() determine # of values for a codebook-lookup-mode-1
  reference: vorbis spec--9.2.3

WRITE compute_codewords() compute the actual huffman encodings
  reference: vorbis spec--3.2.1.1
  SPECIFICATION ISSUE: specification is slightly ambiguous: lengths
     can be out of order, but in all the examples the huffman numbering
     and the entries are in the same order. If this is an actual limitation,
     there's a much simpler implementation.
  notes: try the easy implementation

WRITE bit_reverse() 32-bit bit reversal from stb.h (PD)

Grab DirectSound etc. code from my music composing software,
and use that to write the sample app (so eventually we can
make it play the audio data).

Call vorbis_file() with it.

Switch to using vorbis_memory() so I can look at the entire
stream while debugging.

BUGFIX start_decoder()
  a few typos/brainos etc.

WRITE flush_packet() skip to end of current packet
WRITE get8_packet_raw() split out from get_packet() to implement flush_packet()

==== 7:39 AM Monday, April 02, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 4:13 PM Monday, April 02, 2007

WRITE start_page_no_capturepattern() split out from start_page(), allow seeking

START vorbis_decode_packet() read and decode one packet/frame
  reference: vorbis spec-4.3*, 4.3.1, 4.3.2
  notes: decode window into explicit array for now
  reference: vorbis spec-7.2.2.1
  (skip Floor type 0 for now, since it's not used currently)

WRITE codebook_decode_scalar() brute force huffman decoding
WRITE prep_huffman() prepare the bit-buffer for huffman decode

WRITE neighbors() compute adjacent points to interpolate from
  reference: vorbis spec-9.2.4*, 9.2.4.1

WRITE predict_point() called "render_point" in spec
  reference: vorbis spec-9.2.4.2

MORE vorbis_decode_packet()  more floor-1 code
  notes: probably a better way to do this than calling qsort()..
     the x-coordinates are fixed so can "pre-sort"?

WRITE point_compare() compare points for qsort() above

WRITE draw_line() called "render_line" in spec
  reference: vorbis spec-9.2.4.3

cut and paste "floor1 inverse dB static table" from spec.

MORE vorbis_decode_packet()  residue decode
  reference: vorbis spec-4.3.3, 4.3.4
  SPECIFICATION ISSUE: to answer earlier question, it looks like
          looks like mapping->mux should be all 0 if submaps<=1

MORE start_decoder()
  notes: add more per-channel buffers to store the residues into

WRITE decode_residue() residue decoding
  reference: vorbis spec-8.6.2

==== 8:00 PM Monday, April 02, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 11:22 PM Monday, April 02, 2007

WRITE residue_decode() residue decoding low-level
  reference: vorbis spec-8.6.3, 8.6.4

WRITE codebook_decode() decode from codebook in vector context
  reference: vorbis spec-3.2.1.2.1, 3.2.1.2.2

WRITE inverse_mdct() inverse MDCT
  reference: Sporer et al page 2
  notes: naive O(N^2) 
  reference: M_PI from CRC Standard Mathematical Tables & Formulas

FINISH vorbis_decode_packet()
  notes: skip windowing for now

WRITE vorbis_get_frame() simple streaming API
  notes: ignore window overlapping, just return internal buffers

Test jig: call vorbis_get_frame() until it returns a 0 length.
Convert float buffers to 16-bit integer.
  SPECIFICATION ISSUE: (what's the scaling factor? Probably 32767.)

DEBUGGING

REWRITE compute_codewords()
  notes: ok, the huffman stuff is totally broken. probably wrong assumption
  reference: vorbis spec--3.2.1.1
  notes: hey, turns out there's a simple way to do "unsorted" case too

BUGFIX compute_codewords()
  notes: clear available[z] after allocating it
  notes: (1 << (32-y)), not (1 << (32-i))

BUGFIX neighbors()
  notes: what, no, you can't just find the smallest and largest elements
  notes: wtf was I on when I wrote this?
  reference: vorbis spec-9.2.4*, 9.2.4.1

BUGFIX float32_unpack() unpack a 32-bit float
  notes: no, this _isn't_ just a little-endian IEEE float. code it.
  reference: vorbis spec--9.2.2

Ok, successfully output a waveform that resembles the shark fin in
test1.ogg (plus other wonkiness due to non-windowing presumably),
but it's scaled wrong.

==== 4:22 AM Tuesday, April 03, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 8:13 PM Tuesday, April 03, 2007

BUGFIX: range_list must be indexed by multiplier-1, not multiplier
  notes: caught this with the dump of the internal state

BUGFIX: IMDCT output is scaled incorrectly
  reference: wikipedia MDCT
  notes: Sporer et al calls for scaling the IMDCT by the blocksize;
     this is echoed by wikipedia. Neither formula (they look
     mathematically equivalent) produces correct results. It looks
     like different blocks are being scaled differently wrong, though.
     (Internal dump very helpful here, since I can see the input to the
     IMDCT is correct.)
  notes: What eventually worked for me was NOT scaling the IMDCT at all;
     probably the vorbis encoder pre-multiplies the data to "speed up"
     the decoder?
  SPECIFICATION ISSUE: this should be documented in the spec

FIX: vorbis_decode_packet()
  apply the window

FIX: vorbis_get_frame()
  compute and apply the overlapping; discard first frame

Ok, successfully decoded test1.ogg to a buffer. Scaling looks basically
right.

BUGFIX: residue type 2 is totally screwed up. rewrite

Hooray! Successfully decoded test3.ogg to a file. At least it sounds
right.

TESTING: compare output to libvorbis
  notes: libvorbis output computed using oggdrop
  notes: compute delta by inverting and mixing in soundforge
  notes: garbage at the start of oggdrop output; have to delete and tweak to line up
  notes: nearly perfect; every sample is within +-2... (kind of weird)

== 11:57 PM Tuesday, April 03, 2007

BUGFIX: change float-to-int scale to 32768 (from 32767)
  notes: now soundforge shows +-1
  notes: can still hear the music in the +-1
  notes: but maybe that's normal?  notes: is this just quantization noise? or maybe rounding?

BUGFIX: replace float-to-int (trunc towards 0?) with fast-float-to-int + round
  notes: use old-school store in double, read mantissa trick
  notes: now +- 1 but mostly 0, and just static, can't hear the music in it 

WRITE inverse_mdct_fast(), NlogN 
  reference: Sporer et al, page 2-3

DEBUGGING
  notes: inverse_mdct_fast totally fails
  notes: because of the weird sparseness in arrays, there's probably problems there

BUGFIX: some array entries aren't set
  notes: make parallel structures indicating which entries are valid
  notes: the "bitreverse" step wasn't copying some entries (would be fine if it were in-place)

BUGFIX: some array entries set without being used
  notes: the "butterfly" step (I think?) is overwriting instead of swapping
  notes: probably just left out a swapping step in the notation

BUGFIX: some sign errors in what I typed
  notes: how to debug a fast IMDCT (when you have no idea what the correct intermediate
         values are): set input vector to all 0s and one 1. if the output matches known
         good IMDCT, move the 1 to a different element. If output does not match a known
         good IMDCT, single step through the algorithm and look at each place where the
         value is used (and keep track of it), and double-check those formulae seem sane
         (many values will be 0 at first, so easy to think through)

Fast IMDCT now works! Freeze this version as "oggvorbis_clean.c", since it's now
fast enough to be useable. (2x or 3x too slow, but at least real time.)

record page sequence number so we can be check it

MODIFY start_decoder()
  pump the first frame of audio data by calling vorbis_get_frame()

==== 05:33 AM Wednesday, April 04, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 11:57 AM Wednesday, April 04, 2007

OPTIMIZE: O(1) huffman decode for smaller tokens
   notes: basically the same as zlib & jpeg

OPTIMIZE: binary search huffman decode
   notes: similar to zlib, need to bit-reverse because the bitpacking
      is at the wrong end from optimal
   notes: don't include the huffman symbols for the above "smaller tokens"
      for a tiny win

OPTIMIZE: precompute IMDCT twiddle factors during setup

OPTIMIZE: precompute IMDCT windows during setup
   notes: only multiply the parts of the windows that are non-0, non-1

At this point it's sane (not doing _hugely_ stupid things).

OPTIMIZE: make fast-huffman decode a macro, move it up into all callers
   notes: big win!

Cool. stb_vorbis is now as fast as libvorbis (or oggdrop, anyway)!

Oh wait, it turns out that my performance testing was causing the
convert-to-int code to be discarded, and I'm not actually as fast
as libvorbis if I do the convert-to-int. Getting there, though.

==== 2:17 PM Wednesday, April 04, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 5:44 PM Wednesday, April 04, 2007

Ok, wait, I seem to have regressed--I'm not +-1 relative to libvorbis anymore...

BUGFIX: initialize step2_flag[0] and step2_flag[1]
   notes: not actually a regression--also happens in the file I saved away before
   notes: I spent like 6 hours tracking this down, because it worked fine
      in debug (due to dumb luck) but not in opt/release, but due to the way
      I was set up (only actually saving the decoded file in release) I
      _didn't know_ it was opt-only, so I was madly comparing internal state
      in debug builds to the logged expected state, and it all looked right!
      Because it WAS right--in the debug builds where I could see it. Grr.

==== 11:21 PM Wednesday, April 04, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 2:34 AM Thursday, April 05, 2007

OPTIMIZE: make codebook VQ stop decoding to a temporary buffer
   notes: I was thinking I would have to inline the codebook in
          each place it's called, but it turns out every place works
          identically: it _adds_ the entries to another array! so
          optimizing this is trivial!
   notes: ok, not quite so trivial because one of them steps weirdly,
          and I still probably have to inline it for type=2 to avoid
          the extra pass for deinterleaving, which is the only
          case that's actually running

OPTIMIZE: defer final floor computation until _after_ residue
   reference: vorbis spec-1.3.2.7
   notes: the data fed into the IMDCT is: floor * res
          To avoid an extra temp buffer, we need to compute
              floor *= decoupled(res), or res *= floor.
          We can't do floor *= res, since res is accumulated
          and decoupled; deferring floor decode lets us do
          res *= floor. Need temp buffers to store the
          semi-decoded floor1, but they're tiny.
   notes: can finally switch residue to decoding to main buffers
          instead of their own buffers, get rid of a ton of buffers

OPTIMIZE: codebook VQ for residue type 2
   notes: still trying to strip out temporary memory usage
   notes: kind of messy, having to 'if (++ch == max)'
   notes: push down one level of looping into this version of the codebook
          decoder to reduce overhead (cleaner than inlining it)

Appears to be faster than oggdrop if I _do_ convert to int, but I _don't_
write it to a file. Still want to reduce temporary storage further so not
stopping here.

OPTIMIZE: move window multiply to the code that mixes previous window
  notes: saves an unnecessary pass over the data

OPTIMIZE: split cases for sequence_p
  notes: it looks like sequence_p is really geared to floor0, so we could
         strip out all the code for it if we're sure it's not needed

OPTIMIZE: speed up IMDCT
  notes: remove all but one temp buffer... note, is it 2x what we need?
         i.e. are these buffers being used sparsely and we could pack them better?

Faster than oggdrop WITH converting to int AND writing it to a file! Yay.

==== 5:09 AM Thursday, April 05, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 7:48 AM Thursday, April 05 2007

WRITE make_block_array(), temp_block_array(), temp_alloc()
  notes: functions for allocating temporary data, instead of huge stack things

CHANGE start_decode(), vorbis_free()--malloc() several of the subarrays that are otherwise huge

Examine the memory usage and realize that the soundforge-output test ogg file
has a HUGE sparse codebook... I need to use a sparse representation for that.
Fast huffman + binary search is TOTALLY the necessary sparse representation, too,
except for how it maps to the codebook, so it should be easy.

Except, man, the setup is a pain, and I'm sleepy.

==== 10:51 AM Thursday, April 05, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 6:33 PM Thursday, April 05, 2007

Work on sparse encoding. Write giant document vorbis_codebook.txt
to make sure I know what I'm doing and to have something to refer to
while coding.

==== 8:40 PM Thursday, April 05, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 6:53 AM Friday, April 06, 2007

Work on sparse encoding.

Ok, got everything but the sparse codebook working
==== 7:42 AM Friday, April 06, 2007

Completed the sparse encoding of codebooks.

Work on push-data-into-library API.

==== 10:54 AM Friday, April 06, 2007

Finished coding, time to debug.

Ok, pushing data streams is working, still need to do seek recovery.

==== 12:31 PM Friday, April 06, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 2:55 PM Friday, April 06, 2007

WRITE: vorbis_search_for_page_pushdata() seek/flush recovery for pushdata mode
  notes: crc implementation altered from stb.h according to document
         referenced by ogg spec: http://www.ross.net/crc/download/crc_v3.txt
         (and some trial & error)

==== 4:42 PM Friday, April 06, 2007

------------- END SESSION ---------------


------------ START SESSION --------------

==== 11:55 PM Friday, April 06, 2007

Keep track of the current sample location, and update
with the ogg granulepos; then truncate the final frame
based on it.
  notes: specification is ambiguous; required trial-and-error
  notes: (although this was made worse by my "cleverly" always
         returning as much data as is ready, instead of just
         to the center of the frame)

==== 2:33 AM Saturday, April 07, 2007

------------ START SESSION --------------


TODO OPTIMIZE: get rid of the qsort() call in the floor function
  notes: precompute neighbors, precompute sorted iteration order


==================================================================================
==================================================================================


------------- END SESSION ---------------


------------ START SESSION --------------
 