Codebook representation ======================= Some effort was exerted in balancing the representation of codebooks to allow for both speed and size efficiency, particularly in the handling of sparse codebooks. (Note: some sparse codebooks aren't that sparse, and OUGHT to go the normal route, but this was easier and the difference is minimla.) Codebooks consist of two components: huffman scalar decode (mapping variable-length numbers to small integers) VQ vector decode (mapping small integers to known vectors) Some codebooks are "sparse". For our size-aggressive decode strategy, representing sparse codebooks the normal way explodes in size (one codebook in a test file I looked at had only 81 of 6561 symbols defined, and took 180KB of storage using a non-sparse representation). For the huffman scalar decode, there are three possible integers: codebook symbol -- the integer value represented huffman symbol -- an arbitrary variable-bit-width integer sorted index -- the index of the entry in the sorted_codewords array We store the following fields for each codebook: entries: the "number" of codebook symbols (i.e.. the largest one + 1) sorted_entries: the number of entries in the 'sorted_' tables: if !sparse, the number of symbols not determined by fast-huffman decode if sparse, the number of present (non-sparse) symbols codeword_lengths: if !sparse, the huffman symbol lengths indexed by codebook symbol if sparse, the huffman symbol lengths indexed by sorted index codewords: if !sparse, the huffman symbols indexed by codebook symbol if sparse, NULL (during build, the huffman symbols indexed by build order) fast_huffman: if !sparse, the codebook symbol "indexed" by huffman symbol if sparse, the sorted index "indexed" by huffman symbol sorted_codewords: if !sparse, if STB_VORBIS_HUFFMAN_BINARY_SEARCH, huffman symbols in sorted order else NULL if sparse, all huffman symbols in sorted order sorted_values: if !sparse, if STB_VORBIS_HUFFMAN_BINARY_SEARCH, codebook symbols indexed by sorted index else NULL if sparse, codebook symbols indexed by sorted index Note that generally, all the tables are indexed by the same thing, _except_ the codeword_lengths table. Sparse codebooks have no arrays that are indexed by codebook symbol (since codebook symbol is the thing that is sparse). If STB_VORBIS_NO_DIVIDES_IN_CODEBOOK is defined, then sparse vector codebook accesses begin by determining the codebook symbol as normal (then precede to mod/divide). If STB_VORBIS_NO_DIVIDES_IN_CODEBOOK is NOT defined, then we are required to pre-expand the symbols into their VQ vectors; but we want to do this "sparsely". So we "compact" the pre-expanded codebook so that it is indexed by the 'sorted index' instead of by the 'codebook symbol'. To make this work, the sparse vector codebook access must begin by determining the _sorted index_ without converting it to the codebook symbol. Therefore (a) it calls a different decode function that reports the sorted index; (b) sparse lookups on the fast huffman need to return the sorted index, not the codebook symbol, so that's what we store in the fast huffman; (c) we HAVE to use the binary search to decode a sparse entry.