Suppose we're trying to render truetype fonts to bitmaps without calling malloc(). makeCharacterBitmap() performs rasterization. makeCharacterBitmapMemoryNeeded() computes how much temp memory to pass in. How do we implement the latter function? Well, we have many needs for memory, but the very last one we'll need is for the edge list during rasterization. This depends on the number of simultaneous edges on a single scanline. This depends on the character. (99% of users will never see more than 100, or maybe even 10 or 20, but a font COULD have anything in it.) Specifically, it depends on the edge list, which means we'll need enough temp memory to build the edge list already. We won't have memory to store the active edge list, so computing how big the active edge list *would be* probably requires heroic programming, or maybe unavoidably takes a significant performance hit as we have to rescan the edge list. (Possibly even with heroic programming it's impossible to avoid O(N^2) performance if you can't have another data structure.) So the client will have to pass in to makeCharacterBitmapMemoryNeeded() enough memory to have computed the edge list. So how much memory is that? makeCharacterBitmapMemoryNeededMemoryNeeded() will compute the memory needed for the above function. How do we implement this? Well, we need to get the tesselated edge list. How big is the tesselated edge list? How do we build the tesselated edge list? The way this works in stb_truetype is first we build a list of all the curves in the shape, and then we tesselate it. If we stick with that approach, then this function still needs to build that list of curves. So we need memory to store those curves, so we need makeCharacterBitmapMemoryNeededMemoryNeededMemoryNeeded() Of course you could directly compute the count of tesselated edges from each curves without building up the full lists of curves explicitly. This does not require heroic programming, but it does cost you some performance. That's because truetype doesn't store coordinates as (x,y) pairs. Instead, for a given shape, it stores all of the x coordinates, and then all of the y coordinates, each varying length. So if you want to visit all the curves without storing them, you have to fully parse all the x coordinates to find the start of the y coordinates, and then start over and simultaneously parse out the x coordinates and the y coordinates. Except it's more complicated than that; first there's an array of N flags, then N x coords, then N y coords, where the flags control how you decode the x & y coords. This isn't a *huge* performance suck, especially since each of the other functions above are also going to redo this decoding too, but it's still means writing the code in a much uglier way for doing this pass. So, for the naivest approach (without changing anything in stb_truetype), we'd require: makeCharacterBitmapMemoryNeededMemoryNeededMemoryNeeded() makeCharacterBitmapMemoryNeededMemoryNeeded() makeCharacterBitmapMemoryNeeded() makeCharacterBitmap() With varying amounts of rewriting of the functions, we could reduce the number of these that are needed. We can even avoid the active edge list mess by simply requiring the client to pass in the max # of edges on a single scanline to makeCharacterBitmapMemoryNeeded() and use that to size the memory (and the burden is on the client to set that correctly). But what is all that programming in service of? You, the client, are either going to pass in some pre-allocated memory buffer of fixed size (or, conceptually, a correctly-sized portion of that), or you're going to call malloc and return that to us. And then internally, our library is going to take the temporary memory you pass in and make an arena and suballocate from it. Except wait, our library doesn't actually need all that memory at the same time. We'll have freed up the curve list by the time we have the active edge list, so those can come from the same memory. So, if we want to *minimize* memory usage, we actually need to use a dynamic allocator internally. So, whether you call malloc or use a fixed-memory block, we're going to internally do something equivalent to malloc. (Actually, in this specific case, you might be able to just allocate from the beginning and end of the block, growing towards the middle.) So, in stb_truetype, rather than have to make N passes over things to figure out those sizes in advance -- when you're either going to pass in an *independently-sized fixed-size reserved block*, or just going to call malloc, we just say "hey, you can either let us call malloc, or you can make your own little system to 'malloc' out of your temporary block and pass that to us". That keeps our performance *higher*, and *induces exactly the same amount of fragmentation it would have* (i.e. none, because it's fragmenting this temp memory that we don't care about). It just pushes the complexity onto you. [[ This is clearer with other types of libraries, which have to do significant work to determine things. For example, physics systems which need to keep list of contacts between objects touching each other, those have a variable number of such contacts, and determining how many are needed requires *running the simulation to that point*. A function to determine how many are needed would itself have to run the simulation to the end, and that function needs enough memory to run the simulation to just before the end. I.e. you'd end up with a sort of "iterative deepening" physics that you would call once per memory allocation. In fact, at this point (and you can see this in the stb_truetype pattern as well), it would make more sense from a performance standpoint to, instead of running N functions that repeat all the same work and get a little further in determining how much memory is needed, have each of the N functions *reuse* the work from the N-1th function, i.e. each function is really continuing where it left off. But what that means is that your client code boils down to: void *newmem = NULL; size_t newsize = 0; for(;;) { int code = apiFunctionPartial(..., newmem, &newsize); if (code != NEEDS_MORE_MEMORY) break; newmem = malloc(newsize); } And at this point, you could get the same effect with a lot less library complexity by simply passing in an allocator function. This is what stb_truetype does, except the allocator function isn't passed-in, it's a #define. ]] Back to pushing the complexity onto the client, stb_truetype could instead provide this private-allocator-from-temp-mem itself so you don't have to. And so could every other library you call that has similar behavior. But since you're the person scared of dynamic allocations, I'm perfectly ok with pushing that complexity onto you, rather than requiring every library you might want to use to each independently handle that complexity. And, to be honest, stb_truetype *does* already take a performance hit in the name of simplifying memory management -- to avoid you having to define a *realloc* (alt: to avoid potentially needing 2.9X the memory due to reallocing), stb_truetype actually tesselates the curves twice -- the first time so it can find out how big the final array is, and the second time to fill it out.