Unpacking the Log Odds Update Rule

This blog post assumes some prior knowledge on occupancy mapping. At the heart of the classical Bayesian solution to this problem is the log odds update rule

\[l(v_t) = l(v_{t-1}) + l(v \mid z_t) - l(v),\]

where

$l(x)$ is the “log odds” of an event $x$, equivalent to $\log \frac{p(x)}{1 - p(x)}$
$v$ represents a certain voxel in the world
$z_t$ represents measurements from a sensor of the world (imagine a LIDAR) at timestep $t$
$l(v_t)$ is the posterior log odds that $v$ is occupied
$l(v_{t-1})$ is our prior log odds belief on $v$ given previous sensor measurements
$l(v \mid z_t)$ is the log odds of the “inverse sensor model” $p(v \mid z_t)$
$l(v)$ is the prior log odds that $v$ is occupied

That is,

\[\text{posterior log odds} = \text{previous log odds} + \text{log odds of sensor's belief} - \text{prior log odds}.\]

Roughly speaking, adding log odds is reasonable since it corresponds to multiplying probabilities. But what’s the role of the last term $l(v)$?

An Alternate Form

To gain intuition, we derive the update rule in terms of probabilities. By Bayes’ rule, assuming independent sensor measurements, we have

\[p(v_t) = \frac{ \overbrace{p(z_t \mid v) \cdot p(v_{t-1})}^{\text{if $v$ is occupied}} }{ \underbrace{p(z_t \mid v) \cdot p(v_{t-1})}_{\text{if $v$ is occupied}} + \underbrace{p(z_t \mid \neg v) \cdot p(\neg v_{t-1})}_{\text{if $v$ is free}} }\]

Since the forward sensor model $p(z_t \mid v)$ is typically intractable (it requires marginalization over other voxel states), we rewrite it in terms of the inverse sensor model $p(v \mid z_t)$, again using Bayes’:

\[p(z_t \mid v) = \frac{p(v \mid z_t) \cdot p(z_t)}{p(v)}.\]

Substituting into the original equation and cancelling the $p(z_t)$ terms, we end up with our final probability update rule:

\[p(v_t) = \frac{ \frac{p(v \mid z_t)}{p(v)} \cdot p(v_{t-1}) }{ \frac{p(v \mid z_t)}{p(v)} \cdot p(v_{t-1}) + \frac{p(\neg v \mid z_t)}{p(\neg v)} \cdot p(\neg v_{t-1}) }.\]

Interpretation

Notice that the “weight” on the occupied case is now $\frac{p(v \mid z_t)}{p(v)}$ and the “weight” on the not occupied case is now $\frac{p(\neg v \mid z_t)}{p(\neg v)} = \frac{1-p(v \mid z_t)}{1-p(v)}$. If the first weight is $\ge 1$, then the second is $\le 1$, and vice versa. This means that our belief that the voxel is occupied will increase if and only if $p(v \mid z_t) > p(v)$. Put another way,

Our belief that the voxel is occupied will increase if and only if our sensor becomes more certain that the voxel is occupied after observing measurement $z_t$.

This idea is surprisingly deep. For example, if our sensor outputs a probability of $0.5$ with a prior of $0.2$, even if our previous belief is already high at $0.8$, our posterior belief will still increase! Because we trust that the sensor measurement is providing positive evidence that the voxel is occupied.

Humans do this naturally when evaluating advice from others. Imagine I’m trying to decide if a restaurant is good. I already think it’s excellent based on my experience. A professional food critic — someone known for being tough and hard to impress — visits and says, “It’s pretty good.” Because I expect critics to be harsh, even a mild compliment strongly boosts my belief that the restaurant is excellent. Later, a passionate foodie friend — who loves almost every place they visit — also says, “It’s pretty good.” Since I expect them to be enthusiastic about most restaurants, their lukewarm praise actually makes me question my belief a little.

Why Odds Notation is Useful

Going back to log odds notation, we can also quantify the impact of the sensor’s measurements on our belief as the “weight of evidence” $l(v \mid z_t) - l(v)$. Let’s revisit the above example, where our sensor outputs $0.5$ with a prior of $0.2$. We have a previous belief of $0.8$. What should our posterior belief be? In odds form, this is straightforward [source]:

\[\begin{align*} \text{prior odds} = 1:4 &\xrightarrow[]{\cdot 2 , \cdot 1} \text{sensor odds given measurement} = 2:4 \\ \text{previous belief odds} = 4:1 &\xrightarrow[]{\cdot 2 , \cdot 1} \text{posterior belief odds} = 8:1. \end{align*}\]

After observing the measurement, the “weight” on the occupied case doubled for our sensor. So to compute our posterior, we double the “weight” on the occupied case as well!

This naturally extends to multiple states. Suppose our sensor can see color, and our voxel can now be occupied and red, occupied and blue, or free. We may have a belief update like

\[\begin{align*} \text{prior odds} = 1:2:3 &\xrightarrow[]{\cdot 3 , \cdot 1, \cdot 2} \text{sensor odds given measurement} = 3:2:6 \\ \text{previous belief odds} = 5:1:1 &\xrightarrow[]{\cdot 3 , \cdot 1, \cdot 2} \text{posterior belief odds} = 15:1:2. \end{align*}\]

This notation is clearly very powerful for expressing Bayesian updates¹!

The Atomicity Assumption

There’s a tricky gotcha here. You may wonder what happens if you collapse the “occupied red/blue” events. Naively, we would get

\[\begin{align*} \text{prior odds} = 3:3 &\xrightarrow[]{\cdot 1.66 , \cdot 2} \text{sensor odds given measurement} = 5:6 \\ \text{previous belief odds} = 6:1 &\xrightarrow[]{\cdot 1.66 , \cdot 2} \text{posterior belief odds} = 10:2 \neq 16:2. \end{align*}\]

The apparent contradiction rises because it’s invalid to collapse states like this. If the sensor’s measurements can distinguish between “occupied red” and “occupied blue”, we must either model those states separately or agree with the sensor’s relative prior likelihoods between those states, in order to apply the multiplicative update rule. I was unable to find a well-known term for this, so let’s call this the atomicity assumption.

Intuitively, if two independent measurements both suggest that the voxel is occupied and red, that should persuade me that the voxel is occupied more than if one measurement says occupied and red and the other says occupied and blue. Concretely, uncollapsed odds match our expectations

\[\begin{align*} \text{previous belief odds} = 1:1:1 &\xrightarrow[\scriptsize\text{measurement1 says occupied+red}]{\cdot 5, \cdot 1, \cdot 1} \text{posterior odds} = 5:1:1 \xrightarrow[\scriptsize\text{measurement2 also says occupied+red}]{\cdot 5, \cdot 1, \cdot 1} \text{posterior odds} = 25:1:1 \\ \text{previous belief odds} = 1:1:1 &\xrightarrow[\scriptsize\text{measurement1 says occupied+red}]{\cdot 5, \cdot 1, \cdot 1} \text{posterior odds} = 5:1:1 \xrightarrow[\scriptsize\text{measurement2 says occupied+blue}]{\cdot 1, \cdot 5, \cdot 1} \text{posterior odds} = 5:5:1 \end{align*}\]

while collapsing would make the two cases indistinguishable

\[\begin{align*} \text{previous belief odds} = 2:1 &\xrightarrow[\scriptsize\text{measurement1 says occupied+red}]{\cdot 6, \cdot 1} \text{posterior odds} = 6:1 \xrightarrow[\scriptsize\text{measurement2 also says occupied+red}]{\cdot 6, \cdot 1} \text{posterior odds} = 36:1 \\ \text{previous belief odds} = 2:1 &\xrightarrow[\scriptsize\text{measurement1 says occupied+red}]{\cdot 6, \cdot 1} \text{posterior odds} = 6:1 \xrightarrow[\scriptsize\text{measurement2 also says occupied+red}]{\cdot 6, \cdot 1} \text{posterior odds} = 36:1. \end{align*}\]

What if the inverse sensor model is learned?

In modern settings, we may model the inverse sensor using a neural network. In this case, since the prior term originates from the use of Bayes’ rule on the forward sensor model, it’s important to use the prior that’s learned by the network (even if a better prior exists elsewhere).

How do we extract this prior? It seems to depend on the model architecture, training algorithm, and training data. One idea is to re-train your model, but with noise fed as inputs rather than the real measurements. In practice, perhaps simple proxies like class frequency may be a good enough².

No matter how performant your learned inverse sensor model is, as long as it’s well-calibrated, we can use its outputs to update our beliefs. Put another way, no matter how “smart” or “dumb” your inverse sensor model is, as long as it doesn’t “lie”, we can use evidence from its measurements to update our beliefs.

Finally, to add to the list of broken assumptions, in addition to the well-known cell independence and measurement independence assumptions, learned inverse sensor models often break the atomicity assumption. For example, if our measurement space is RGB images, our sensor may learn substates of “occupied” such as “occupied by a chair” or “occupied by a red object”. As shown above, this will lead to inaccurate updates, and how to mitigate this error term is an open question.

Footnotes / Open Questions

Why? I’m not entirely sure! The math certainly works out, assuming Bayes’ rule in probability form. Is there a more fundamental or intuitive explanation of this, maybe a visual proof? Or should we accept this as axiom? ↩
Is there a training-free way to extract true priors from a trained neural network? ↩