5 easy bayesian inference

A hungry mouse is inside its burrow and must decide what to do next. It can stay hidden, go outside to forage, or perhaps first gather more information by peeking or listening. The problem is that the mouse does not know the true state of the world: a fox may be nearby, or it may not. Instead, the mouse receives uncertain sensory cues, such as smell, a shadow, a sound, or movement in the grass. These cues are informative but imperfect. A shadow might mean danger, or it might be harmless.

The mouse therefore has two linked tasks. First, it must infer the hidden state: how likely is it that a fox is present? Second, it must choose an action. Foraging may lead to food, but it may also be dangerous. Hiding is safer but leaves the mouse hungry. Active inference frames this as choosing actions that reduce uncertainty while leading to preferred outcomes.

In this first example, we will set the stage, by fully solving an easy case of Bayesian inference.

5.1 definitions

generative process: This is the outside reality, with all its complexity. Here it is the prairie, the vegetation, the wind, the fox, the birds, etc.
external state x^\star: This is the ground truth the mouse would love to know, but doesn’t have access to. Here it is the presence or absence of the fox in the vicinity of the burrow.
hidden state x: This is what the mouse is modeling in its head. That’s a terrible name, since x represents something the mouse believes in, and from its own point of view this is not hidden at all. Alas, that’s what this is called by everyone. x is the internal state or internal representation of x^\star. Here, P(x) is the mouse’s belief regarding the probability of the fox’s presence: P(x=0,x=1), where x=0 means that the fox is absent, and x=1 that it is present.
sensory outcome y: This is what the mouse observes from the world. Here, y is the strength of the fox’s smell, it can be either strong (y=1) or faint (y=0).

5.2 Bayes’ rule

We will use Bayes’ rule to solve the inference problem:

P(x\mid y) = \frac{P(y\mid x)P(x)}{P(y)} = \frac{\text{likelihood} \times \text{prior}}{\text{evidence}}

5.3 prior and likelihood

It’s daytime, and the mouse believes that the fox must be in its den sleeping. The prior will be

P(x)= \begin{bmatrix} P(x=0) \\ P(x=1) \end{bmatrix}= \begin{bmatrix} 0.98 \\ 0.02 \end{bmatrix}.

The numbers above mean that the mouse believes that there is a 98% chance that the fox is absent, and a 2% chance that it is present. This is the mouse’s belief before it receives any sensory information. Later on we will call this prior vector D.

The mouse’s past experience has taught it the likelihood that it will sense a strong or faint fox smell in the case that the fox is absent or present. Mathematically, the likelihood is written as P(y\mid x). We can write this as a matrix, where the columns correspond to the hidden state x and the rows to the sensory outcome y:

	x=0 (fox absent)	x=1 (fox present)
y=0 (faint smell)	0.7	0.1 (false negative)
y=1 (strong smell)	0.3 (false positive)	0.9

The false positive rate of 0.3 means that even when the fox is absent, there is a 30% chance that the mouse will receive a strong smell, maybe because of the lingering scent of a fox that passed by earlier, or because of the smell of a skunk. The false negative rate of 0.1 means that even when the fox is present, there is a 10% chance that the mouse will receive a faint smell, maybe because the wind is blowing in the opposite direction.

In a later chapter we will call this likelihood matrix A.

Both the prior and the likelihood are part of the mouse’s internal model of the world. Together, they are the mouse’s generative model. They are not necessarily true, but they are what the mouse believes to be true.

Let’s say that in a given day, the mouse receives a strong smell. This corresponds to the second row of the likelihood matrix above, where y=1. The likelihood of receiving a strong smell given the hidden state is then:

P(y=1\mid x)= \begin{bmatrix} 0.3 \\ 0.9 \end{bmatrix}.

When we observe y=1, we extract the corresponding row from the A matrix and treat it as a vector of likelihoods for each possible state of x.

Attention: while the elements of the prior vector P(x) sum to 1, the elements of the likelihood vector P(y=1\mid x) do not sum to 1. This is because the likelihood is not a probability distribution over x, but rather a conditional probability distribution over y given x. The likelihood tells us how likely it is to observe y=1 for each possible value of x, but it does not tell us how likely each value of x is.

The numerator of Bayes’ rule is the product of the likelihood and the prior, which gives us:

P(y=1\mid x)P(x) = \begin{bmatrix} 0.3 \\ 0.9 \end{bmatrix} \odot \begin{bmatrix} 0.98 \\ 0.02 \end{bmatrix} = \begin{bmatrix}0.294 \\ 0.018 \end{bmatrix}.

Here we didn’t use matrix (or vector) multiplication, but rather element-wise multiplication, which is also called the Hadamard product.

5.4 evidence

The evidence is also called the marginal likelihood, and it is the probability of observing the sensory outcome y=1 (in this case) under the mouse’s model. It is computed by marginalizing over the hidden state x:

P(y=1) = \sum_x P(y=1\mid x)P(x).

This expression makes it obvious that the evidence is the sum of the numerator of Bayes’ rule across all possible hidden states. In our case, we have:

P(y=1) = 0.294 + 0.018 = 0.312.

5.5 posterior

Finally, we can compute the posterior distribution over the hidden state x given the sensory outcome y=1 (strong smell) using Bayes’ rule:

P(x\mid y=1) = \frac{P(y=1\mid x)P(x)}{P(y=1)} = \frac{\begin{bmatrix}0.294 \\ 0.018 \end{bmatrix}}{0.312} = \begin{bmatrix}0.9423 \\ 0.0577 \end{bmatrix}.

5.6 discussion

Initially, the mouse believed that there was only a 2% chance that the fox was present. After receiving the strong smell, the mouse’s belief has been updated to a 5.77% chance that the fox is present. The strong smell has increased the mouse’s belief in the presence of the fox, but it is still more likely that the fox is absent.

What if the mouse’s prior belief was different? For example, if the mouse believed that there was a 50% chance that the fox was present, then the posterior would be:

P(x\mid y=1) = \frac{\begin{bmatrix}0.3 \\ 0.9 \end{bmatrix} \odot \begin{bmatrix}0.5 \\ 0.5 \end{bmatrix}}{0.6} = \frac{\begin{bmatrix}0.15 \\ 0.45 \end{bmatrix}}{0.6} = \begin{bmatrix}0.25 \\ 0.75 \end{bmatrix}.

Now, after receiving the strong smell, the mouse’s belief that the fox is present has been updated from 50% to a 75% chance. A single piece of sensory evidence can have a very different impact on the mouse’s belief depending on its prior belief!

Another interesting point is what would the mouse believe if it received a faint smell (y=0) instead of a strong smell (assuming the first prior, where the probability of the fox being present is 2%). In that case, the likelihood would be:

P(y=0\mid x) = \begin{bmatrix}0.7 \\ 0.1 \end{bmatrix}.

The numerator of Bayes’ rule would be: P(y=0\mid x)P(x) = \begin{bmatrix}0.7 \\ 0.1 \end{bmatrix} \odot \begin{bmatrix}0.98 \\ 0.02 \end{bmatrix} = \begin{bmatrix}0.686 \\ 0.002 \end{bmatrix}.

The evidence would be: P(y=0) = 0.686 + 0.002 = 0.688.

The posterior would be: P(x\mid y=0) = \frac{\begin{bmatrix}0.686 \\ 0.002 \end{bmatrix}}{0.688} = \begin{bmatrix}0.9971 \\ 0.0029 \end{bmatrix}.

A strong smell increases the posterior probability of fox presence by more percentage points than a faint smell decreases it: strong smell increases it by 3.77 percentage points, while a faint smell decreases it by 1.71 percentage points. This is partly because the prior probability was already very low. A better measure of the evidential force of an observation is the likelihood ratio. In this example, a strong smell multiplies the odds of fox presence by 3, while a faint smell multiplies them by 0.1/0.7, strongly pushing belief away from fox presence. Let’s see why:

\begin{align*} \text{strong: likelihood ratio} &= \frac{P(y=1\mid x=1)}{P(y=1\mid x=0)} = \frac{0.9}{0.3} = 3 \\ \text{faint: likelihood ratio} &= \frac{P(y=0\mid x=1)}{P(y=0\mid x=0)} = \frac{0.1}{0.7} \approx 0.143. \end{align*}

At the beginning, I called this an easy case of Bayesian inference. As we will see in the next chapter, it is rare to have the luxury of being able to solve the inference problem in closed form using Bayes’ rule. In most cases, we will have to use approximate inference methods, which are more computationally efficient but less accurate.