How the stb_truetype Anti-Aliased Software Rasterizer v2 Works

Sean Barrett, 2015-07-15

Abstract

Previous versions of stb_truetype to 1.06 used a simple extension of the traditional scanline filling algorithm to compute horizontally-antialiased scanlines combined with vertical oversampling. The stb_truetype 1.06 algorithm (a re-invention of an algorithm generally credited to Levien's LibArt) computes the exact area of the concave polygon clipped to each pixel (within floating point precision) by computing the signed-area of trapezoids extending rightwards from each edge, and uses that signed-area as the AA coverage of the pixel. This provides more accuracy than sampling, although it is incorrect when shapes overlap (as it measures the area of the shapes within the pixel, not the fraction of the pixel that the shapes obscure), and is slightly faster than the old algorithm.

Traditional scan conversion

To determine if a point is within a concave-with-holes polygon, we classify each polygon edge with a direction (making each polygon a "winding"). Then, we cast a ray from the point to infinity, and count the edges it crosses, using a signed number for each edge (+1 in one direction, -1 in the other direction). If the final sum is non-zero, then the point is inside the polygon (assuming a non-zero fill rule).

The classic algorithm for filling such a polygon simply computes the same calculation incrementally.

For any scanline (e.g. the green lines above), we can start at negative infinity (i.e. anywhere to the left of the polygon) and scan along the line, counting crossings. Whenever the crossing count goes from zero to non-zero, we begin a filled region; whenever it goes from non-zero to zero, we end a filled region. For a closed polygon, it will always end up zero if it started at zero. (There are various things one must be careful about, e.g. if a vertex falls exactly on a scanline, we must be careful how we count those edges as 'crossing' that scanline. These are engineering details that don't affect the algorithm, and don't actually require much or any code if the right conventions are used.)

To actually implement this, a typical algorithm is:

Gather all of the directed edges into an array of edges
Sort them by their topmost vertex
Move a line down the polygon a scanline at a time, and for each scanline:
- Add to the "active edge list" any edges which have an uppermost vertex before this scanline, and store their x intersection with this scanline (because the edge list has been sorted by topmost y, edges from the edge list are always added in order)
- Remove from the active edge list any edge whose lowestmost vertex is above this scanline
- Sort the active edge list by their x intersection (incrementally, as it doesn't change much from scanline to scanline)
- Iterate the active edge list from left to right, counting the crossing-number, and filling pixels as filled edges start and end
- Advance every active edge to the next scanline by computing their next x intersection (which just requires adding a constant dx/dy associated with the edge)

Traditional scan conversion with 1D anti-aliasing

For anti-aliasing, we want to know approximately how much of each pixel is covered by the polygon. For efficient implementation, we treat the pixel as a little square and literally just measure the coverage of that square; this is equivalent to using a box filter, which while not necessarily ideal is a reasonable trade-off of performance for quality. (Doing anything other than a box filter would be much, much more complicated.)

I'm not sure if I read this algorithm somewhere or if I (re-)invented it myself, but it is a pretty straightforward extension. In the above algorithm, we treat scanlines as infinitely-thin 1D lines, but then we only sample the crossing-count at (say) pixel centers, despite the fact that we have more information along the horizontal axis.

Here we've divided the lower "scanline" into multiple pixels, where each blue tick mark represents the boundary between pixels. We can easily measure how much of the scanline line segment between two pixels is "inside" the polygon, by coloring the parts of the line that are inside a different color:

Here we can see that pixel #1 is about 25% pink, pixel 3 is about 80% pink, and pixel #4 is 100% pink. Thus we can determine that the (one-dimensional) anti-aliasing "coverage" of each of those pixels are 25%, 80%, and 100%.

The algorithm for computing this is identical to the previous algorithm; just the rule for how to fill pixels is different, as we have to track the fractional coverage of each "pixel" (line segment). Doing this is easy, though, since the edge crossings are always sorted left-to-right.

stb_truetype 1.05 and earlier use an algorithm like the above. Rather than computing the above on every scanline, the software "oversamples" along the y axis, placing 5 "scanlines" per row of pixels, and letting the computation for each one contribute only 1/5th of the "coverage" value computed for each pixel. (For small characters, stb_truetype oversamples by a factor of 15; 5 and 15 are used because they divide 255 evenly, so they avoid the need for any fixup of the final sum, which must be 0..255.)

The "New" Algorithm

I imagine this "new" algorithm I developed for stb_truetype 1.06 has already been published, although I was unable to find it anywhere (at least anywhere not behind a paywall).

The core idea behind the algorithm, using signed areas, came from Maxim Shemanarev's Anti-Grain Geometry source code; my coworker Fabian Giesen investigated that code to learn how the anti-aliased rasterizer worked, and came back with an incomplete understanding of the details, but a suspicion of that core idea. He shared the core idea with me, and I reworked a brand new algorithm from first principles based on it.

Edit: The idea itself appears to have reached AGG from FreeType which derived it from Ralph Levien's LibArt; whether it came from him originally or was a published algorithm is unknown.

Edit: I have now found other write-ups of this algorithm: here and here; however, I think this article is still generally clearer and more thorough, and some of the ideas are handled slightly differently.

Signed area

A classic algorithm for measuring the area of a concave polygon is to compute the signed area of a number of triangles, one per edge, and sum them

as this merely requires computing a cross-product per edge.

Perehaps less frequently used, but based on the same principle, one can measure the area as the sum of the signed area of a number of right-trapezoids:

The area of an axially-aligned right-trapezoid is particularly easy to compute, and these fit very well into the scanline framework.

Signed-area per pixel algorithm

Conceptually, the "new" AA algorithm computes the area of the entire polygon clipped to each pixel and uses that as the coverage for the pixel. Very little clipping is actually involved, and in practice most of the algorithm is nearly identical with the "active edge list" logic from the previous algorithm.

Gather all of the directed edges into an array of edges
Sort them by their topmost vertex
Move a line down the polygon a scanline at a time (where a scanline is now 1-pixel tall, rather than infinitely thin), and for each scanline:
- Add to the "active edge list" any edges which have an uppermost vertex within this scanline, and store their x intersection with this scanline (because the edge list has been sorted by topmost y, edges from the edge list are always added in order)
- Remove from the active edge list any edge whose lowestmost vertex is above this scanline
- Compute the signed-area pixel coverage of the scanline (discussed below)
- Advance every active edge to the next scanline by computing their next x intersection (which just requires adding a constant dx/dy associated with the edge)

The differences here--beside the per-scanline processing discussed below--consist only of treating the scanline as 1-pixel tall (which slightly changes the rules for when edges are added and removed from the list), and omitting the need to sort the active edges from left to right.

Signed-area scanline rasterization

The basic premise is we want to use signed-trapezoid areas to compute the pixel areas. The illustration above showed trapezoids filled to the bottom edge, but we'll use trapezoids filled horizontally, to the right edge of the bitmap.

In other words, we want to compute something like the following (where color represents contribution from different edges, not the sign of the area):

Here, each edge contributes to a right-extending trapezoid that covers multiple pixels, and theoretically extends infinitely far to the right. (The shapes within a single pixel may not be a trapezoid, e.g. the third pink shape, but it decomposes easily into a trapezoid and a rectangle.)

Later, we might close this polygon by drawing edges further to the right with the opposite signed area; these signed areas will extend rightwards, cancelling out the signed area in the pixels to the right of those new edges and producing 0-area per-pixel in those farther-right pixels.

Note first that we aren't trying to measure the area of the whole polygon, just the area coverage of each pixel, so we don't actually care about the signed area of the trapezoid from the edge all the way to the right; rather we just care about the signed area within each pixel.

Note second that the area in each pixel covered by the trapezoid for a given edge varies in the pixels that the edge passes through, but is constant for all of the pixels further to the right. (E.g. the three green rectangles are the same area, and if there were more pixels to the right, they would have the same coverage as the rightmost pixel.)

So the algorithm for the scanline is:

   For each active edge:
      For each pixel the edge intersects:
         Compute the rightwards-trapezoid-ish area covering this pixel
         Add the above area to the signed area for this pixel
      For each pixel right of the edge:
         Add the "height" of the edge within the scanline to the signed area for this pixel

The last line is because the areas to the right are always rectangles with width=1 pixel, so the area is the height*1, i.e. just height.

Note that there is no need to traverse the active edges in any particular order, as the signed sums will be the same regardless, which is why the algorithm no longer keeps the active edges sorted horizontally.

Minimizing right-of-edge processing

To achieve efficiency, it is necessary to implement the last two lines efficiently; accumulating into all of the pixels to the horizontal edge of the bitmap would be inefficient with many shapes; with e edges crossing a scanline that is n pixels wide, we might have to perfom as many as e*n add operations.

To avoid this, we use the inverse of a cumulative sum table. If we make a table X and then later compute a cumulative sum S of X (where S[0] = X[0], S[n] = S[n-1] + X[n]), then we can create the effect of filling S[j..infinity] by simply writing a value into X[j]. Because everything is linear, we can likewise create the effect of adding a value height into S[j..infinity] by simply adding height into X[j].

In practice, we never compute the table S[]; the algorithm looks like this:

  For one scanline:
  1. Initialize arrays A, X to 0
  2. Process e edges (see next section), summing areas into A & X
  3. Let s = 0
  4. For i = 0 to n do:
  5.    s = s + X[i]
  6.    a = A[i] + s
  7.    pixel[i] = 255 * a

This makes one linear pass over the whole pixel array, rather than the as many as e passes as the naive algorithm would.

Trapezoidal signed areas for pixels

The only thing missing from the algorithm above is how to compute the signed area coverage for any pixel which a given edge intersects. Computing this involves a large number of cases, which fortunately we can reduce to a small number of cases.

First, to avoid complexity, we test whether the edge's leftmost and rightmost x coordinates lie within the pixel range [0..n). In a font rasterizer, they normally always do; if they do not, we fall back to a less efficient algorithm that explicitly clip the edge's trapezoid to each pixel.

This means we never have to concern ourselves with the cases where the x coordinates lie outside the range of the A[] and X[] arrays, which reduces the cases to consider, i.e. removes the need for bounds-checking.

A given edge can have at most two intersections with a pixel, but those intersections can be with various combinations of the top and bottom and the sides. Additionally, an edge may have only one intersection (with any of the 4 sides), or no intersections at all.

To avoid this complexity, we can treat an intersection with the top the same as the top vertex lying within the pixel, and an intersection with the bottom the same as the bottom vertex lying within the pixel. The math is not identical as is, but we avoid a case explosion by handling them as the same kind of cases, by simply computing relative to the clipped endpoints, whether those be at the edges or somewhere inside.

Case 1: The edge touches one pixel

Here two of the four possible versions of this case are shown (the other two intersect only at the top or the bottom).

The calculation of the trapezoidal area covered to the right of these edges is straightforward. If the x coordinates are measured from the right side of the pixel, then the trapezoid's area is the height times the average of the two x-coordinates.

Case 2: The edge touches two or more pixels

Computing the area covered by the right-extending trapezoids for these edges isn't much harder. In each case we must compute the intersections with the left and right sides of the pixels to evaluate the trapezoid formula mentioned above. However, once we compute the leftmost pixel's right-side intersection, the following intersections all increase linearly; in the same way that the vertical scanning algorithm simply steps the current x by dx/dy, so too can we simply step the y value of the side intersection by dy/dx.

But, even simpler than that, the area covered of each successive pixel itself increases linearly; that is, if there are 5 pixels covered, the area of pixels #1 and #5 are "arbitrary", but the difference between the area of pixel #2 and pixel #3 is the same as the difference between the area of pixel #3 and pixel #4; and that difference itself is also dy/dx.

Thus, handling this case requires:

  1. Compute the intersection of the edge with right side of the leftmost pixel
  2. Compute the area of the triangle in the leftmost pixel
  3. Compute the area of the trapezoid in the second pixel
  4. Process successive pixels, adding dy/dx to the area
  5. Compute the intersection of the edge with left side of the rightmost pixel
  6. Compute the area of the shape in the rightmost pixel

although this is not exactly how the math is computed in stb_truetype.h.

Case 3: Edges sloping in the opposite direction

Case 2 is described as processing the pixels left-to-right, but the edges may slope opposite the direction shown in case 2, which can lead to incorrect math to compute e.g. step 6. However, we note that the area covered in each pixel in the shape below is identical to that covered in the second shape.

Thus, we can write Case 2 strictly in terms of e.g. edges that slope NE-SW (regardless of direction); to handle edges that slope NW-SE, we simply flip the edge vertically (not swapping the endpoints, but actually flipping the y-coordinates and negating the slopes), and then use Case 2.

Alternatively, it may be possible to simply use the same code for both slopes, through judicious usage of absolute-value operations when computing the areas, and this may be faster as well, but I didn't try it.

Sample code for Case 2 & 3

This is the code in stb_truetype for Case 2 & 3, which actually converts everything into NW-SE slopes, which is the opposite of the Case 2 illustrations.

{
   int x,x1,x2;
   float y_crossing, step, sign, area;
   // covers 2+ pixels
   if (x_top > x_bottom) {
      // flip scanline vertically; signed area is the same
      float t;
      y0 = y_bottom - (y0 - y_top);
      y1 = y_bottom - (y1 - y_top);
      t = y0, y0 = y1, y1 = t;
      t = x_bottom, x_bottom = x_top, x_top = t;
      dx = -dx;
      dy = -dy;
      t = x0, x0 = xb, xb = t;
   }

   x1 = (int) x_top;
   x2 = (int) x_bottom;
   // compute intersection with y axis at x1+1
   y_crossing = (x1+1 - x0) * dy + y_top;

   sign = e->direction;
   // area of the rectangle covered from y0..y_crossing
   area = sign * (y_crossing-y0);
   // area of the triangle (x_top,y0), (x+1,y0), (x+1,y_crossing)
   scanline[x1] += area * (1-((x_top - x1)+(x1+1-x1))/2);

   step = sign * dy;
   for (x = x1+1; x < x2; ++x) {
      scanline[x] += area + step/2;
      area += step;
   }
   y_crossing += dy * (x2 - (x1+1));

   scanline[x2] += area + sign * (1-((x2-x2)+(x_bottom-x2))/2) * (y1-y_crossing);
   scanline_fill[x2] += sign * (y1-y0);
}

Performance

This algorithm runs slightly faster than the 5-times oversampled algorithm used in previous versions of stb_truetype.h. Since the 5-times-oversampled algorithm has to process 5 times as many scanlines, this may seem less speedup than expected; however, despite having the same overall framework, it has to do a lot more work.

The performance-relative differences are:

Needs to sum A[] and X[], rescale, and convert from float to integer
Precomputes dy/dx per edge (to avoid having to do multiple divides later)
Needs to compute two y-intersections per horizontal-ish edge (divide-free)
Trapezoid math computation (two or three multiplies, including one for sign)
Still sorts initial edges vertically, but doesn't need to sort active edges horizontally

In particular, all of the code above for Case 2 is handled with oversampling by simply oversampling using the normal trivial code, whereas Case 2 involves a lot of processing per edge (although the per-pixel costs are fairly low for edges that cover many pixels).

Limitations

The algorithm exactly computes the area of polygons intersecting a given pixel. However, if multiple overlapping polygons intersect the same pixel, it is measuring the area of the polygons, not the fraction of the pixel that is covered.

For example, if the entire shape above were within a single pixel, the algorithm would correctly compute the coverage of the pixel. However, if the interior hole were wound in the opposite direction, it would cease to be a hole, and the entire shape would be filled. In this case, the new algorithm would report the signed area of the shape as the sum of the outer shape and the inner shape, when the actual coverage of the pixel would simply be the area of the outer shape (the area covered by the inner shape is already counted in the outer shape's area, so that area is double-counted).

In practice large double-covered areas will be opaque, and the double-counting will be clamped to 100% coverage and have no effect. I believe in practice most errors due to double-counting will occur at concave corners where two shapes meet, leading to a single darker pixel in those areas. I believe that excessive darkening of concave corners will be unobjectionable.

It would be possible for someone to author a font where every character has the whole shape repeated twice or more; a traditional rasterizer, including the old stb_truetype one, would draw this identically to there only being a single copy of the shape, whereas the new one would draw all pixels with twice as much AA coverage (clamped), which would be an objectionable artifact. I do not expect to find fonts doing this in the wild, though. If it does turn out to be objectionable, the stb_truetype still has the old rasterizer available on a compile-time switch.

home