On Optimal Static Grid Pathfinding - Sean Barrett - 2012

It always seemed to me like pathfinding on a grid of static data
should be able to be done way faster by querying some kind of
database, but I never found anything very good.

Here I'll try to reconstruct the best thing I worked out in 2005.
I didn't publish it because it didn't seem very useful.
It seems a bit like "TRANSIT on Video Games Maps" (Antsfeld
et al, 2012) although less flexible.

Given an N x N grid of nodes.

Subdivide into M x M cells that are K x K.

M is arbitrary, but for simplicity, use sqrt(N). (Something in
simple proportion to sqrt(N) will be optimal.)

The perimeter nodes are the ones on the border of each cell (leading
into adjacent cells). Any path that begins or ends within the cell
and doesn't stay within the cell must cross the perimiter nodes.

To find a path from A to B, locate the cells for A and B. If they
are in the same cell, this algorithm does not find cell-interior
paths, only paths that go through the perimiter (you must run a
local pathfind; however, since it's within a cell, it's not too
long).

Assuming they're different, break the path into three phases:
from A to perimiter of A's cell; from A cell's perimiter to B
cell's perimiter; from B cell's perimiter to B.

Algorithm shortest path:

   for every cell X in A cell's perimiter
      for every cell Y in B cell's perimiter
          sum cost of A to X, X to Y, and Y to B
          find minimum

If each of these costs can be computed in unit time
using precomputed tables, the time this will take is
the size of the perimiters squared.

For each K x K cell, there are ~4K perimiter nodes (actually 4K - 4).
Thus the total time will be 16K^2. Assuming sqrt(N) proportionality,
this is proportional to N (e.g. 16*256 time units for a 256x256 map).

Space requirements: assume we can store costs+direction in 32
bits, and there's no compression:

   store shortest path cost from cell perimiter to contents of cell: 

     per cell:
       4K * K^2 * 4 bytes

     total:
       4K * K^2 * M^2 * 4 = 16K * N^2 bytes

   store shortest path cost from any perimiter cell to any other perimiter cell:

     (4K * M^2) * (4K * M^2) * 4 bytes = 64*K^2*M^4 = 64*N^2*M^2

Before we go further with these numbers, we can do better.
If we look at what we're encoding for these perimiters
graphically, we can see we're storing entire rows/columns
of precomputed data, but with pairs of rows/columns adjacent
for adjacent cells. If instead of using perimiters we put
"boundaries" between cells, and use those boundaries, we
can reduce the total amount of data we need.

Now it's:

   shortest-path from cell perimiter to contents of cell:
      16K * N^2 (unchanged)

   boundary cell to boundary cell precomputation:
   (a row/column of boundary cells is N wide, and there are ~M rows and ~M columns)
      (2M*N) * (2M*N) * 4 bytes = 16 * M^2 * N^2

So the total storage needed is:

   16K * N^2 + 16 * M^2 + N^2

or
   16 * (K + M^2) * N^2

So we want to minimize K+M^2, where K*M=N. Instead of looking at derivatives,
let's just examine a common case, N = 256:

    N     K   M    K+M^2
   256    4  64    4100
   256    8  32    1032
   256   16  16     272
   256   32   8      96
   256   64   4      80
   256  128   2     132

But as K grows larger, the algorithm gets slower (quadratic in K). So let's
suppose K=32 and M=8. How much storage does this algorithm require?

   16 * (32 + 8^2) * N^2
                          = 16 * 96 * 65536 bytes
                          = 96 MB

Which is too large to be very useful (the TRANSIT paper cites 5MB and 10MB for
CPD/TRANSIT on the synthetic Rooms-32 256x256 map). The main difference with
TRANSIT seems to be that TRANSIT figures out which perimiter nodes are actually
needed and limits the data to those (so there may be a worst case that's almost
as bad), and also has a more general approach to region chunking which I don't
100% know how to compare, but maybe basically would be equivalent to somehow
allows K and M to vary independently of each other.

Maybe it's interesting for small maps, though, since maybe it's
simpler than CPD or TRANSIT. If you had a 64x64 map, with K=16
and M=4 and storing 2 byte distances, you'd need 32*8*64*64 = 1MB total.

In fact, let's precompute the within-cell-path cost as well to optimize
the case where you don't leave a cell. Each cell has K^2 nodes, so we
need to store (K^2)*(K^2) precomputed paths per cell, with M*M paths.
If we can use this data to compute point-in-cell-to-point-on-perimiter
costs, we end up needing:

   (2*K^2 + 8*M^2) * N^2 bytes

(This is slightly wrong because we switched from perimiters to boundaries,
so we actually need to precompute (K+1)^2 cells.)

Now we set K=M=8, and we get:

   (2*64+8*64) * 64^2 = (10*64)*64^2 = 2.5MB

If we set K=16, M=4 we get:

   (2*256+8*16) * 64^2 =(5*128)*64^2 = 2.5MB

Since K=8 is faster, we prefer K=M=8. Computing
a pathfind operation for this then requires
summing ~64 triples out of the tables.

Also, just to be clear how little we're saving, precomputing
the full data set would require 2*N^4 = 64MB. So it's
probably not very interesting. (On the other hand, we
can also compute it faster than the full data set; it
might be feasible to compute on level load in a downloadable
title. However, doing proper A* at run-time on 64x64 maps is
probably not problematic in performance in the first place.)

It seems like these approaches offer some kind of speed
versus size tradeoff; fully precomputed is O(1) time,
O(N^4) precomputed data. Above approach requires O(K^2)
time and O(M^2 * N^2) data. Pathfinding on the raw grid
requires O(N^2) time and O(N^2) data (storing the basic
edge costs, and equal runtime storage as well).

Note that K^2 * M^2 is N^2. Thus all three of the above
pathfinding algorithms require worst case O(N^4) combined
time * space. Is this a provable minimum? (It might be for
arbitrary graphs but not for grids.) It would be interesting
to evaluate TRANSIT and CPD's worst case under this model.
They at least have the advantage that they CAN do better
than their worst case, whereas the algorithm described
here uses the same amount of performance/storage for sensible,
coherent game maps and for random maps with random edge costs,
which isn't a good trade-off in practice.