Kalman & Particle Filters

State-space inference: hidden state evolves over time, observations arrive one step at a time. From the linear-Gaussian closed form (Kalman) through local linearization (EKF, UKF) to fully nonlinear, non-Gaussian particle filters.

Filtering is the time-recursive cousin of posterior inference. The unknown is a hidden state $x_k$ that evolves over time according to some dynamics $x_k = f(x_{k-1}) + w_k$; what you see are noisy observations $y_k = h(x_k) + v_k$. At each step, the goal is the filtering posterior $p(x_k \mid y_{1:k})$ — the belief over the current state given everything observed so far. The four methods below differ in what they assume about $f$, $h$, and the noise, and in how they represent the belief.

Why this is a separate page. The Monte Carlo & MCMC page covers static targets — a posterior $p(\theta \mid y)$ that doesn't change as new data arrives. Filtering is a different problem shape: the target moves with time and you want a recursive update, not a batch sampler. The two pages share the importance-sampling machinery (the particle filter is sequential IS with resampling), but otherwise solve different problems.

1. The state-space model

A discrete-time state-space model is two equations:

$$ x_k = f(x_{k-1}) + w_k,\qquad y_k = h(x_k) + v_k, $$

where $w_k$ and $v_k$ are process and observation noise. The filtering job is to compute $p(x_k \mid y_{1:k})$ sequentially, updating the belief by one observation at a time. Two recursions do the work:

Whether either integral has a closed form depends on the assumptions about $f$, $h$, and the noise. The four methods below sit on a tradeoff ladder: start with an exact Gaussian closed form, then relax linearity, then relax the local-linearization requirement, then finally relax the Gaussian-belief restriction entirely.

StepWhat changesRequirementWhat can fail
Kalman Compute the filtering posterior exactly as mean and covariance. Linear dynamics, linear observations, Gaussian noise. Nonlinear or non-Gaussian systems violate the model.
EKF Allow nonlinear dynamics/observations by linearizing locally. Jacobians and a posterior narrow enough for local linearization. Curvature over the uncertainty region can bias or destabilize the filter.
UKF Push sigma points through the nonlinear functions instead of using Jacobians. A unimodal Gaussian belief is still a good summary. Multimodal or heavy-tailed posteriors get collapsed into one Gaussian.
Particle filter Represent the posterior with weighted samples instead of one Gaussian. Enough particles, likelihood evaluations, and resampling control. Weight collapse, sample impoverishment, and poor scaling with state dimension.

2. Closed-form filtering: the Kalman family

closed-form Kalman filter (linear-Gaussian)

If $f$ and $h$ are linear ($x_k = Fx_{k-1} + w_k,\ y_k = Hx_k + v_k$) and the noises are Gaussian, the filtering distribution stays Gaussian forever and is given exactly by:

$$ \begin{aligned} \hat x_k^- &= F\hat x_{k-1}, \quad P_k^- = F P_{k-1} F^\top + Q & \text{(predict)} \\ K_k &= P_k^- H^\top (H P_k^- H^\top + R)^{-1} & \text{(gain)} \\ \hat x_k &= \hat x_k^- + K_k(y_k - H\hat x_k^-) \\ P_k &= (I - K_k H)\,P_k^- & \text{(update)} \end{aligned} $$

This is not a Monte Carlo method; it is the closed-form posterior for the linear-Gaussian special case. Use it when you can.

closed-form Extended Kalman filter (EKF)

For mildly nonlinear $f,h$, linearize at the current mean: $F_k = \partial f/\partial x\big|_{\hat x_{k-1}}$, $H_k = \partial h/\partial x\big|_{\hat x_k^-}$, then apply the standard Kalman equations. Cheap and widely deployed, but a poor approximation when the local linearization disagrees with the true function over the spread of the prior.

closed-form Unscented Kalman filter (UKF)

Instead of linearizing the function, propagate a deterministic set of $2n+1$ sigma points through the true nonlinear $f,h$ and refit a Gaussian to their image. Second-order accurate, requires no Jacobians, same cost as EKF. Still fundamentally Gaussian, and fails for multimodal or heavy-tailed posteriors.

Figure 1 compares the filtering approximations on one nonlinear update. Kalman-style methods keep only a Gaussian belief; the difference is how they push that Gaussian through nonlinear dynamics and observations.

Figure 1 · Kalman, EKF, and UKF through a nonlinear observation
true posterior shape EKF Gaussian UKF Gaussian

3. Sequential Monte Carlo

When the state-space model is genuinely nonlinear or non-Gaussian, no closed form exists and Gaussian approximations break. SMC represents $p(x_{0:k}\mid y_{1:k})$ by a weighted cloud of particles $\{(x^{(i)}_{0:k}, w^{(i)}_k)\}_{i=1}^N$ and updates the cloud as each new observation arrives.

smc Sequential importance sampling (SIS)

Apply importance sampling step by step. With proposal $\pi(x_k\mid x_{0:k-1}, y_{1:k})$, the weights update recursively:

$$ w_k^{(i)} \propto w_{k-1}^{(i)} \cdot \frac{p(y_k\mid x_k^{(i)})\,p(x_k^{(i)}\mid x_{k-1}^{(i)})}{\pi(x_k^{(i)}\mid x_{0:k-1}^{(i)}, y_{1:k})}. $$

The catastrophe: after a few steps almost all weight ends up on one particle. This is weight degeneracy, and it is the reason SIS by itself is essentially never used.

smc Particle filter / SIR (bootstrap)

Add a resampling step: whenever the effective sample size drops below a threshold, draw $N$ new particles with replacement from the weighted cloud, duplicating heavy particles and killing light ones, and reset all weights to $1/N$. The simplest variant, the bootstrap filter, uses the prior $p(x_k\mid x_{k-1}^{(i)})$ as the proposal, which collapses the weight update to just the likelihood:

$$ w_k^{(i)} \propto p(y_k\mid x_k^{(i)}). $$
Bootstrap particle filter
  1. For each particle $i$: draw $x_k^{(i)} \sim p(x_k \mid x_{k-1}^{(i)})$ from the dynamics.
  2. Reweight: $w_k^{(i)} \propto p(y_k \mid x_k^{(i)})$.
  3. Normalize weights; compute ESS.
  4. If ESS < threshold: resample $N$ particles $\propto w_k^{(i)}$ and reset weights to $1/N$.

Variants improve on the bootstrap proposal: the optimal proposal $\pi(x_k\mid x_{k-1}^{(i)}, y_k)$ minimizes weight variance but is tractable only for special models; local-linearization proposals use an EKF or UKF on each particle to construct a locally near-optimal Gaussian proposal. Resampling itself has variants (multinomial, stratified, systematic, residual) that differ in variance.

Resampling cures weight degeneracy but introduces sample impoverishment: heavy particles get duplicated and the cloud loses diversity. Roughening (jittering particles by a small noise) and MCMC moves between resampling steps are the standard remedies.

Figure 2 runs SIS and a bootstrap particle filter on the same nonlinear tracking problem, so the weight-collapse failure and the resampling fix are visible side by side.

Figure 2 · Bootstrap particle filter on a nonlinear scalar SSM
true state observation $y_k$ particle cloud posterior mean
Figure 3 · Particle genealogy after repeated resampling
surviving lineages extinct lineages

4. Shared filtering structure

All four algorithms share the same predict-then-update recursion. They differ in how the predict pushes the belief forward, how the update incorporates $y_k$, and what kind of belief they carry.

Kalman filter EKF UKF Particle filter (SIR)
Family linear-Gaussian closed form local-linearization closed form sigma-point closed form weighted-sample SMC
Requires linear $f, h$; Gaussian noise smooth $f, h$; Jacobians $\partial f, \partial h$ $f, h$ evaluable (no Jacobians) $f$ sampleable; $p(y\mid x)$ evaluable
Belief Gaussian $(\hat x, P)$ Gaussian $(\hat x, P)$ Gaussian $(\hat x, P)$ weighted cloud $\{(x^{(i)}, w^{(i)})\}$
Predict $\hat x^- = F\hat x;\ P^- = F P F^\top + Q$ $F_k = \partial f/\partial x;\ \hat x^- = f(\hat x);\ P^- = F_k P F_k^\top + Q$ $\sigma_j \leftarrow$ sigma points; $\sigma'_j = f(\sigma_j)$; refit Gaussian $x^{(i)}_k \sim p(x_k \mid x^{(i)}_{k-1})$
Update Kalman gain on $y_k - H\hat x^-$ Kalman gain with $H_k = \partial h/\partial x$ Kalman gain via $\gamma_j = h(\sigma'_j)$ $w^{(i)} \propto w^{(i)}_{k-1}\,p(y_k \mid x^{(i)}_k)$
Resample if $\mathrm{ESS}(w) < \tau$: resample $\propto w$; reset $w \leftarrow 1/N$
Main failure any departure from linearity / Gaussianity local-linearization error; divergence under strong nonlinearity still Gaussian; fails on multimodal / heavy-tailed posteriors weight degeneracy; sample impoverishment after resampling

Each column relaxes one assumption from the column to its left. Reading across Predict shows the relaxation explicitly: matrix multiply by a fixed $F$ becomes a Jacobian-at-the-mean, then a Jacobian-free sigma-point propagation, then a non-Gaussian particle propagation. The following cards zoom in on each step.

5. Pair by pair: what changes between methods

Three transitions, each a single design swap. The pseudocode bodies have the same structure; only the highlighted lines change.

Kalman ↔ Extended Kalman filter

Replace the linear $F, H$ with their Jacobians at the current mean.

Kalman filter
x̂_0, P_0 = prior
for k = 1 to K:
    # predict
    x̂⁻ = F · x̂_{k-1}
    P⁻ = F · P_{k-1} · Fᵀ + Q
    # update
    K = P⁻ · Hᵀ · (H · P⁻ · Hᵀ + R)⁻¹
    x̂_k = x̂⁻ + K · (y_k − H · x̂⁻)
    P_k = (I − K · H) · P⁻
return chain (x̂_k, P_k)
Extended Kalman filter
x̂_0, P_0 = prior
for k = 1 to K:
    # predict (linearize dynamics)
    F_k = ∂f/∂x |_{x̂_{k-1}}
    x̂⁻ = f(x̂_{k-1})
    P⁻ = F_k · P_{k-1} · F_kᵀ + Q
    # update (linearize observation)
    H_k = ∂h/∂x |_{x̂⁻}
    K = P⁻ · H_kᵀ · (H_k · P⁻ · H_kᵀ + R)⁻¹
    x̂_k = x̂⁻ + K · (y_k − h(x̂⁻))
    P_k = (I − K · H_k) · P⁻
return chain (x̂_k, P_k)

Most of the differences are local substitutions: $F \to F_k$, $H \to H_k$ (now computed at each step rather than constant), and the state-mean propagations $F\hat x$ and $H\hat x$ are replaced by the nonlinear function evaluations $f(\hat x)$ and $h(\hat x)$. Two Jacobian lines are added. The Kalman-gain formula and the covariance update are structurally identical; they just use the Jacobians in place of the constant matrices.

Extended Kalman ↔ Unscented Kalman filter

Replace Jacobian linearization with deterministic sigma-point propagation through the true nonlinear functions.

Extended Kalman filter
x̂_0, P_0 = prior
for k = 1 to K:
    # predict
    F_k = ∂f/∂x |_{x̂_{k-1}}
    x̂⁻ = f(x̂_{k-1})
    P⁻ = F_k · P_{k-1} · F_kᵀ + Q
    # update
    H_k = ∂h/∂x |_{x̂⁻}
    K = P⁻ · H_kᵀ · (H_k · P⁻ · H_kᵀ + R)⁻¹
    x̂_k = x̂⁻ + K · (y_k − h(x̂⁻))
    P_k = (I − K · H_k) · P⁻
return chain (x̂_k, P_k)
Unscented Kalman filter
x̂_0, P_0 = prior
for k = 1 to K:
    # predict
    σ_j = sigma_points(x̂_{k-1}, P_{k-1})
    σ'_j = f(σ_j)        # pass through true f
    x̂⁻ = Σ_j w_j · σ'_j
    P⁻ = Σ_j w_j · (σ'_j − x̂⁻)(σ'_j − x̂⁻)ᵀ + Q
    # update
    γ_j = h(σ'_j)        # pass through true h
    C = Σ_j w_j · (σ'_j − x̂⁻)(γ_j − γ̄)ᵀ
    S = Σ_j w_j · (γ_j − γ̄)(γ_j − γ̄)ᵀ + R
    K = C · S⁻¹
    x̂_k = x̂⁻ + K · (y_k − γ̄)
    P_k = P⁻ − K · S · Kᵀ
return chain (x̂_k, P_k)

The Jacobian-and-multiply pattern is replaced by sigma-point-and-sample-cov: instead of approximating $f$ by its first-order Taylor expansion at the mean, the UKF picks $2n+1$ deterministic points around the mean, pushes each through the true $f$ and $h$, and refits a Gaussian to the image. The Kalman-gain identity reappears in the update, but built from sample covariances of the propagated sigma points rather than from Jacobian-and-prior products. Same cost as EKF, second-order accurate, no Jacobians needed.

Unscented Kalman ↔ Particle filter (bootstrap)

Replace the Gaussian belief with a weighted particle cloud; replace Kalman gain with likelihood reweighting; add resampling.

Unscented Kalman filter
x̂_0, P_0 = prior Gaussian
for k = 1 to K:
    # predict
    σ_j = sigma_points(x̂_{k-1}, P_{k-1})
    σ'_j = f(σ_j)
    x̂⁻, P⁻ = mean & cov of σ'_j;  P⁻ += Q
    # update
    γ_j = h(σ'_j)
    K = cov(σ'_j, γ_j) · (cov(γ_j) + R)⁻¹
    x̂_k = x̂⁻ + K · (y_k − mean(γ_j))
    P_k = P⁻ − K · cov(γ_j, σ'_j)

return chain (x̂_k, P_k)
Particle filter (bootstrap)
{x^(i)_0}_{i=1..N}, w^(i) = 1/N    # cloud, not Gaussian
for k = 1 to K:
    # predict
    for each i:
        x^(i)_k ~ p(x_k | x^(i)_{k-1})
                                  # sample from dynamics
    # update
    for each i:
        w^(i) ∝ w^(i)_{k-1} · p(y_k | x^(i)_k)
    normalize w^(i)
    if ESS(w) < τ:
        resample N particles ∝ w
        w^(i) ← 1/N
return chain of clouds {x^(i)_k, w^(i)_k}

The largest of the three transitions: the belief representation changes (Gaussian sufficient statistics → weighted particle cloud), the predict step changes from deterministic sigma-point propagation to actual samples from the dynamics, the update step changes from a Kalman gain to a likelihood reweighting, and a resampling step is added to fight weight degeneracy. The shared structure is just the outer `for k` loop and the predict/update labels. This is what it takes to leave the Gaussian world.

6. Filter choice by model

The decision is mostly about how nonlinear the dynamics are and whether the posterior stays unimodal. Tags echo the section colors: closed-form means a Gaussian closed-form posterior (no sampling), smc means weighted particles updated sequentially.

MethodUse whenStrengthsPitfalls
Kalman filter
closed-form
Linear-Gaussian state-space model. Filtering / smoothing of $x_{0:T}$ given $y_{1:T}$. Exact; closed-form $O(d^3)$ per step. Restricted to linear dynamics and Gaussian noise.
EKF
closed-form
Mildly nonlinear state-space; tracking, robotics, navigation. Drop-in replacement for KF; cheap. First-order linearization error; can diverge under strong nonlinearity.
UKF
closed-form
More nonlinear state-space than EKF can handle but still unimodal Gaussian. Second-order accurate; no Jacobians required. Still a Gaussian approximation; fails for multimodal / heavy-tailed posteriors.
Particle filter (SIR)
smc
Nonlinear, non-Gaussian state-space; tracking, robotics, gene networks, anything multimodal over time. Asymptotically exact for fully general SSMs; handles multimodal posteriors. Sample impoverishment after resampling; scales poorly with state dimension.
Quick decision tree.
  1. Linear dynamics and Gaussian noise? → Kalman filter.
  2. Mildly nonlinear, Jacobians cheap and well-behaved? → EKF.
  3. More nonlinear but still unimodal Gaussian belief is OK? → UKF.
  4. Genuinely nonlinear / non-Gaussian / multimodal posterior? → particle filter.
  5. Need to do inference on a static posterior (no time dimension)? → see Monte Carlo & MCMC.

What next

Filtering connects to static-posterior sampling, dependency structure, and the linear systems story.