Efficiently Computing Bridgelets

Kevin Buchin
@kbuchin

Aleksandr Popov
@aleqss

This note accompanies the code available on github.

KrummΒ [1] has recently introduced the concept of bridgelets. In his work, he aims to use a data-driven approach to interpolate trajectories between the measured locations. The main idea is to compute probabilities for visiting certain regions in between measurements. The regions in question are the cells of a square grid that subdivides the location space. By also discretising time, Krumm arrives at the model where at each time step, the trajectory can stay in the current cell or move to one of its direct neighbours. He uses this model to compute visit probabilities of cells in a grid based on a set of real-life trajectories.

Illustration of a bridgelet from (0, 0) to (40, 20) in T = 400 steps. Figure shows five walks drawn uniformly at random from \approx 7\cdot 10^{273} walks in the bridgelet. Walks were sampled in \mathcal{O}(T) time (after \mathcal{O}(T^3)-time preprocessing) using the dynamic program.

The dynamic program makes computing with bridgelets more efficient, since it avoids constructing the bridgelets explicitly. This makes it possible to use a finer grid and time steps, or to reasonably cover sparser trajectories. A bridgelet is the set of all walks between two given vertices of the grid in TT time steps. Krumm enumerates the walks explicitly to count the visits to the cells, which takes time exponential in TT, making it impractical to use for T>12T > 12. Our dynamic programming approach computes all the relevant statistics in π’ͺ(T3)\mathcal{O}(T^3) time instead of π’ͺ(5T)\mathcal{O}(5^T), making it feasible to compute the relevant statistics easily at least up to T=500T = 500. With sufficient parallelisation and optimisation, it is possible to make the approach even faster, as we only need π’ͺ(T)\mathcal{O}(T) time when computing on π’ͺ(T2)\mathcal{O}(T^2) computing units in parallel. Furthermore, we show how to use this information to generate random trajectories that follow a certain distribution, in π’ͺ(T)\mathcal{O}(T) time per trajectory of TT steps.

Running times in milliseconds for the two approaches with T=12T = 12.
Problem DP Explicit
All paths 1 260331
Visiting paths from (0,0)(0, 0) to (1,1)(1, 1) 4 273017

We also provide closed-form expressions to compute the number of walks to a specific cell in TT steps, and show how to obtain a formula for the visit probabilities. Specifically, the number of walks in a bridgelet from (0,0)(0, 0) to (x,y)(x, y) in TT steps is |W(x,y,T)|=βˆ‘i=0⌊k/2βŒ‹(Tkβˆ’2i)(Tβˆ’k+2ii)(Tβˆ’k+2ix+i),\lvert W(x, y, T)\rvert = \sum_{i = 0}^{\lfloor k / 2\rfloor} \binom{T}{k - 2i}\binom{T - k + 2i}{i}\binom{T - k + 2i}{x + i}, where k=Tβˆ’xβˆ’yk = T - x - y.

Dynamic Program

Some simple questions stated above can be solved by means of dynamic programming. The basic idea is to count paths that reach a certain cell (x,y)(x, y) in TT steps, so we can count paths reaching this cell and its neighbours in T+1T + 1 steps. With some care and exploiting the symmetry of the problem, we can use this simple idea to also compute the visit probabilities for all cells on any possible path from (0,0)(0, 0) to (x,y)(x, y).

Counting Paths

The simplest question is the following: given a time limit TT and the point (x,y)(x, y), count the distinct paths in W(x,y,T)W(x, y, T). By means of dynamic programming, we can even solve a more general question efficiently: given the time limit TT, count the paths in W(x,y,t)W(x, y, t) for all possible grid points (x,y)(x, y) and for all 0≀t≀T0 \leq t \leq T. This can be accomplished by creating a table PP on the coordinates [βˆ’T,T]Γ—[βˆ’T,T]Γ—[0,T][-T, T] \times [-T, T] \times [0, T] and filling it out as follows: P(0,0,0)=1,P(x,y,0)=0for all xβ‰ 0 or yβ‰ 0,P(x,y,t+1)=P(x,y,t)+P(xβˆ’1,y,t)+P(x+1,y,t)+P(x,yβˆ’1,t)+P(x,y+1,t)for all x, y and for all tβ‰₯0.\begin{aligned} P(0, 0, 0) &= 1\,,\\ P(x, y, 0) &= 0 \quad\text{for all $x \neq 0$ or $y \neq 0$,}\\ P(x, y, t + 1) &= P(x, y, t) + P(x - 1, y, t) + P(x + 1, y, t)\\ &+ P(x, y - 1, t) + P(x, y + 1, t) \quad\text{for all $x$, $y$ and for all $t \geq 0$.}\\\end{aligned}

Here and later we assume that out-of-bounds values in the table are 00. It is easy to see that after filling the entire table, a cell P(x,y,t)P(x, y, t) for some (x,y)∈[βˆ’T,T]2(x, y) \in [-T, T]^2 and t∈[0,T]t \in [0, T] holds the count of paths in W(x,y,t)W(x, y, t). Furthermore, for any (x,y)(x, y) out of range, the value should clearly be 00, as those points are not reachable in TT steps. The dynamic program can clearly be computed in time π’ͺ(T3)\mathcal{O}(T^3). We show an example result for T=10T = 10 below.

The heat map for path counts at T = 10, in raw form (left) and on a logarithmic scale (right). The white cells are unreachable.

Counting Paths Through a Vertex

Secondly, we can solve the following question: given a time limit TT and the point (x,y)(x, y), count the paths in W(x,y,T)W(x, y, T) that go through a vertex (a,b)(a, b). We can compute these counts efficiently for all possible vertices (a,b)(a, b). We give further details here.

The heat map for counting paths that visit the respective cells on a path from (0, 0) to (1, 1) with T = 10, in raw form (left) and on a logarithmic scale (right). The white cells are not visited.

Counting with Obstacles

The approach is quite flexible. In particular, it can be adapted to counting paths on a grid with holes that represent obstacles.

The heat map for path counts at T = 10, in the presence of a wall with a gap, in raw form (left) and on a logarithmic scale (right). Note how the cells on the right side are not symmetric to the ones on the left side and, in particular, the count in the cell (2, -1) is lower than in (2, -2).
The heat map for path counts at T = 10, in the presence of a smaller wall with a gap, in raw form (left) and on a logarithmic scale (right).
The heat map for path counts at T = 10, in the presence of a small wall, in raw form (left) and on a logarithmic scale (right).

Suppose we have some representation SS of a set of obstacles. Depending on the application, it may be best represented as a boundary, or directly as a set of cells; its complexity affects the running time. Then we can formulate the following DP for counting paths in P(x,y,t)P(x, y, t) for all (x,y)(x, y) and all t≀Tt \leq T: Po(0,0,0)=1,Po(x,y,0)=0for all xβ‰ 0 or yβ‰ 0,Po(x,y,t+1)=0for all (x,y)∈S,Po(x,y,t+1)=Po(x,y,t)+Po(xβˆ’1,y,t)+Po(x+1,y,t)+Po(x,yβˆ’1,t)+Po(x,y+1,t)for all (x,y)βˆ‰S and for all tβ‰₯0.\begin{aligned} P_o(0, 0, 0) &= 1\,,\\ P_o(x, y, 0) &= 0 \quad\text{for all $x \neq 0$ or $y \neq 0$,}\\ P_o(x, y, t + 1) &= 0 \quad\text{for all $(x, y) \in S$,}\\ P_o(x, y, t + 1) &= P_o(x, y, t) + P_o(x - 1, y, t) + P_o(x + 1, y, t)\\ &+ P_o(x, y - 1, t) + P_o(x, y + 1, t) \quad\text{for all $(x, y) \notin S$ and for all $t \geq 0$.}\\\end{aligned} We discuss computing probabilities in the setting with obstacles here.

Generating Random Trajectories

Computing these statistics using dynamic programming makes it easier to also sample realistic trajectories from bridgelets. The easiest way is to generate the trajectory starting from the end point, one segment at a time, by reconstructing the transition probabilities using the counts stored in the DP. The approach takes π’ͺ(T)\mathcal{O}(T) time per trajectory, assuming the DP with path counts for the time limit not smaller than TT is precomputed (which can be done in π’ͺ(T3)\mathcal{O}(T^3) time). We show some trajectories from (0,0)(0, 0) to (40,20)(40, 20) in T=400T = 400 steps in FigureΒ 1. These are generated based on the values of PP, without obstacles. The trajectories appear to exhibit reasonable levels of randomness, with (discretised) wandering behaviour spread throughout the trajectory.

Closed-Form Expressions

We have shown the expression for the path count above; we show further details here (also includes a mathematical derivation for the visit probabilities).

[1] John Krumm. 2022. Maximum Entropy Bridgelets for Trajectory Completion. To appear in Proceedings of the 30th International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2022).