Kalman & Particle Filters
Filtering is the time-recursive cousin of posterior inference. The unknown is a hidden state $x_k$ that evolves over time according to some dynamics $x_k = f(x_{k-1}) + w_k$; what you see are noisy observations $y_k = h(x_k) + v_k$. At each step, the goal is the filtering posterior $p(x_k \mid y_{1:k})$ — the belief over the current state given everything observed so far. The four methods below differ in what they assume about $f$, $h$, and the noise, and in how they represent the belief.
1. The state-space model
A discrete-time state-space model is two equations:
$$ x_k = f(x_{k-1}) + w_k,\qquad y_k = h(x_k) + v_k, $$where $w_k$ and $v_k$ are process and observation noise. The filtering job is to compute $p(x_k \mid y_{1:k})$ sequentially, updating the belief by one observation at a time. Two recursions do the work:
- Predict: push the belief forward through the dynamics. $p(x_k \mid y_{1:k-1}) = \int p(x_k \mid x_{k-1})\,p(x_{k-1} \mid y_{1:k-1})\,dx_{k-1}$.
- Update: tilt the prediction by the new observation's likelihood. $p(x_k \mid y_{1:k}) \propto p(y_k \mid x_k)\,p(x_k \mid y_{1:k-1})$.
Whether either integral has a closed form depends on the assumptions about $f$, $h$, and the noise. The four methods below sit on a tradeoff ladder: start with an exact Gaussian closed form, then relax linearity, then relax the local-linearization requirement, then finally relax the Gaussian-belief restriction entirely.
| Step | What changes | Requirement | What can fail |
|---|---|---|---|
| Kalman | Compute the filtering posterior exactly as mean and covariance. | Linear dynamics, linear observations, Gaussian noise. | Nonlinear or non-Gaussian systems violate the model. |
| EKF | Allow nonlinear dynamics/observations by linearizing locally. | Jacobians and a posterior narrow enough for local linearization. | Curvature over the uncertainty region can bias or destabilize the filter. |
| UKF | Push sigma points through the nonlinear functions instead of using Jacobians. | A unimodal Gaussian belief is still a good summary. | Multimodal or heavy-tailed posteriors get collapsed into one Gaussian. |
| Particle filter | Represent the posterior with weighted samples instead of one Gaussian. | Enough particles, likelihood evaluations, and resampling control. | Weight collapse, sample impoverishment, and poor scaling with state dimension. |
2. Closed-form filtering: the Kalman family
closed-form Kalman filter (linear-Gaussian)
If $f$ and $h$ are linear ($x_k = Fx_{k-1} + w_k,\ y_k = Hx_k + v_k$) and the noises are Gaussian, the filtering distribution stays Gaussian forever and is given exactly by:
$$ \begin{aligned} \hat x_k^- &= F\hat x_{k-1}, \quad P_k^- = F P_{k-1} F^\top + Q & \text{(predict)} \\ K_k &= P_k^- H^\top (H P_k^- H^\top + R)^{-1} & \text{(gain)} \\ \hat x_k &= \hat x_k^- + K_k(y_k - H\hat x_k^-) \\ P_k &= (I - K_k H)\,P_k^- & \text{(update)} \end{aligned} $$This is not a Monte Carlo method; it is the closed-form posterior for the linear-Gaussian special case. Use it when you can.
closed-form Extended Kalman filter (EKF)
For mildly nonlinear $f,h$, linearize at the current mean: $F_k = \partial f/\partial x\big|_{\hat x_{k-1}}$, $H_k = \partial h/\partial x\big|_{\hat x_k^-}$, then apply the standard Kalman equations. Cheap and widely deployed, but a poor approximation when the local linearization disagrees with the true function over the spread of the prior.
closed-form Unscented Kalman filter (UKF)
Instead of linearizing the function, propagate a deterministic set of $2n+1$ sigma points through the true nonlinear $f,h$ and refit a Gaussian to their image. Second-order accurate, requires no Jacobians, same cost as EKF. Still fundamentally Gaussian, and fails for multimodal or heavy-tailed posteriors.
Figure 1 compares the filtering approximations on one nonlinear update. Kalman-style methods keep only a Gaussian belief; the difference is how they push that Gaussian through nonlinear dynamics and observations.
3. Sequential Monte Carlo
When the state-space model is genuinely nonlinear or non-Gaussian, no closed form exists and Gaussian approximations break. SMC represents $p(x_{0:k}\mid y_{1:k})$ by a weighted cloud of particles $\{(x^{(i)}_{0:k}, w^{(i)}_k)\}_{i=1}^N$ and updates the cloud as each new observation arrives.
smc Sequential importance sampling (SIS)
Apply importance sampling step by step. With proposal $\pi(x_k\mid x_{0:k-1}, y_{1:k})$, the weights update recursively:
$$ w_k^{(i)} \propto w_{k-1}^{(i)} \cdot \frac{p(y_k\mid x_k^{(i)})\,p(x_k^{(i)}\mid x_{k-1}^{(i)})}{\pi(x_k^{(i)}\mid x_{0:k-1}^{(i)}, y_{1:k})}. $$The catastrophe: after a few steps almost all weight ends up on one particle. This is weight degeneracy, and it is the reason SIS by itself is essentially never used.
smc Particle filter / SIR (bootstrap)
Add a resampling step: whenever the effective sample size drops below a threshold, draw $N$ new particles with replacement from the weighted cloud, duplicating heavy particles and killing light ones, and reset all weights to $1/N$. The simplest variant, the bootstrap filter, uses the prior $p(x_k\mid x_{k-1}^{(i)})$ as the proposal, which collapses the weight update to just the likelihood:
$$ w_k^{(i)} \propto p(y_k\mid x_k^{(i)}). $$- For each particle $i$: draw $x_k^{(i)} \sim p(x_k \mid x_{k-1}^{(i)})$ from the dynamics.
- Reweight: $w_k^{(i)} \propto p(y_k \mid x_k^{(i)})$.
- Normalize weights; compute ESS.
- If ESS < threshold: resample $N$ particles $\propto w_k^{(i)}$ and reset weights to $1/N$.
Variants improve on the bootstrap proposal: the optimal proposal $\pi(x_k\mid x_{k-1}^{(i)}, y_k)$ minimizes weight variance but is tractable only for special models; local-linearization proposals use an EKF or UKF on each particle to construct a locally near-optimal Gaussian proposal. Resampling itself has variants (multinomial, stratified, systematic, residual) that differ in variance.
Resampling cures weight degeneracy but introduces sample impoverishment: heavy particles get duplicated and the cloud loses diversity. Roughening (jittering particles by a small noise) and MCMC moves between resampling steps are the standard remedies.
Figure 2 runs SIS and a bootstrap particle filter on the same nonlinear tracking problem, so the weight-collapse failure and the resampling fix are visible side by side.
4. Shared filtering structure
All four algorithms share the same predict-then-update recursion. They differ in how the predict pushes the belief forward, how the update incorporates $y_k$, and what kind of belief they carry.
| Kalman filter | EKF | UKF | Particle filter (SIR) | |
|---|---|---|---|---|
| Family | linear-Gaussian closed form | local-linearization closed form | sigma-point closed form | weighted-sample SMC |
| Requires | linear $f, h$; Gaussian noise | smooth $f, h$; Jacobians $\partial f, \partial h$ | $f, h$ evaluable (no Jacobians) | $f$ sampleable; $p(y\mid x)$ evaluable |
| Belief | Gaussian $(\hat x, P)$ | Gaussian $(\hat x, P)$ | Gaussian $(\hat x, P)$ | weighted cloud $\{(x^{(i)}, w^{(i)})\}$ |
| Predict | $\hat x^- = F\hat x;\ P^- = F P F^\top + Q$ | $F_k = \partial f/\partial x;\ \hat x^- = f(\hat x);\ P^- = F_k P F_k^\top + Q$ | $\sigma_j \leftarrow$ sigma points; $\sigma'_j = f(\sigma_j)$; refit Gaussian | $x^{(i)}_k \sim p(x_k \mid x^{(i)}_{k-1})$ |
| Update | Kalman gain on $y_k - H\hat x^-$ | Kalman gain with $H_k = \partial h/\partial x$ | Kalman gain via $\gamma_j = h(\sigma'_j)$ | $w^{(i)} \propto w^{(i)}_{k-1}\,p(y_k \mid x^{(i)}_k)$ |
| Resample | — | — | — | if $\mathrm{ESS}(w) < \tau$: resample $\propto w$; reset $w \leftarrow 1/N$ |
| Main failure | any departure from linearity / Gaussianity | local-linearization error; divergence under strong nonlinearity | still Gaussian; fails on multimodal / heavy-tailed posteriors | weight degeneracy; sample impoverishment after resampling |
Each column relaxes one assumption from the column to its left. Reading across Predict shows the relaxation explicitly: matrix multiply by a fixed $F$ becomes a Jacobian-at-the-mean, then a Jacobian-free sigma-point propagation, then a non-Gaussian particle propagation. The following cards zoom in on each step.
5. Pair by pair: what changes between methods
Three transitions, each a single design swap. The pseudocode bodies have the same structure; only the highlighted lines change.
Kalman ↔ Extended Kalman filter
Replace the linear $F, H$ with their Jacobians at the current mean.
x̂_0, P_0 = prior
for k = 1 to K:
# predict
x̂⁻ = F · x̂_{k-1}
P⁻ = F · P_{k-1} · Fᵀ + Q
# update
K = P⁻ · Hᵀ · (H · P⁻ · Hᵀ + R)⁻¹
x̂_k = x̂⁻ + K · (y_k − H · x̂⁻)
P_k = (I − K · H) · P⁻
return chain (x̂_k, P_k)
x̂_0, P_0 = prior
for k = 1 to K:
# predict (linearize dynamics)
F_k = ∂f/∂x |_{x̂_{k-1}}
x̂⁻ = f(x̂_{k-1})
P⁻ = F_k · P_{k-1} · F_kᵀ + Q
# update (linearize observation)
H_k = ∂h/∂x |_{x̂⁻}
K = P⁻ · H_kᵀ · (H_k · P⁻ · H_kᵀ + R)⁻¹
x̂_k = x̂⁻ + K · (y_k − h(x̂⁻))
P_k = (I − K · H_k) · P⁻
return chain (x̂_k, P_k)
Most of the differences are local substitutions: $F \to F_k$, $H \to H_k$ (now computed at each step rather than constant), and the state-mean propagations $F\hat x$ and $H\hat x$ are replaced by the nonlinear function evaluations $f(\hat x)$ and $h(\hat x)$. Two Jacobian lines are added. The Kalman-gain formula and the covariance update are structurally identical; they just use the Jacobians in place of the constant matrices.
Extended Kalman ↔ Unscented Kalman filter
Replace Jacobian linearization with deterministic sigma-point propagation through the true nonlinear functions.
x̂_0, P_0 = prior
for k = 1 to K:
# predict
F_k = ∂f/∂x |_{x̂_{k-1}}
x̂⁻ = f(x̂_{k-1})
P⁻ = F_k · P_{k-1} · F_kᵀ + Q
# update
H_k = ∂h/∂x |_{x̂⁻}
K = P⁻ · H_kᵀ · (H_k · P⁻ · H_kᵀ + R)⁻¹
x̂_k = x̂⁻ + K · (y_k − h(x̂⁻))
P_k = (I − K · H_k) · P⁻
return chain (x̂_k, P_k)
x̂_0, P_0 = prior
for k = 1 to K:
# predict
σ_j = sigma_points(x̂_{k-1}, P_{k-1})
σ'_j = f(σ_j) # pass through true f
x̂⁻ = Σ_j w_j · σ'_j
P⁻ = Σ_j w_j · (σ'_j − x̂⁻)(σ'_j − x̂⁻)ᵀ + Q
# update
γ_j = h(σ'_j) # pass through true h
C = Σ_j w_j · (σ'_j − x̂⁻)(γ_j − γ̄)ᵀ
S = Σ_j w_j · (γ_j − γ̄)(γ_j − γ̄)ᵀ + R
K = C · S⁻¹
x̂_k = x̂⁻ + K · (y_k − γ̄)
P_k = P⁻ − K · S · Kᵀ
return chain (x̂_k, P_k)
The Jacobian-and-multiply pattern is replaced by sigma-point-and-sample-cov: instead of approximating $f$ by its first-order Taylor expansion at the mean, the UKF picks $2n+1$ deterministic points around the mean, pushes each through the true $f$ and $h$, and refits a Gaussian to the image. The Kalman-gain identity reappears in the update, but built from sample covariances of the propagated sigma points rather than from Jacobian-and-prior products. Same cost as EKF, second-order accurate, no Jacobians needed.
Unscented Kalman ↔ Particle filter (bootstrap)
Replace the Gaussian belief with a weighted particle cloud; replace Kalman gain with likelihood reweighting; add resampling.
x̂_0, P_0 = prior Gaussian
for k = 1 to K:
# predict
σ_j = sigma_points(x̂_{k-1}, P_{k-1})
σ'_j = f(σ_j)
x̂⁻, P⁻ = mean & cov of σ'_j; P⁻ += Q
# update
γ_j = h(σ'_j)
K = cov(σ'_j, γ_j) · (cov(γ_j) + R)⁻¹
x̂_k = x̂⁻ + K · (y_k − mean(γ_j))
P_k = P⁻ − K · cov(γ_j, σ'_j)
return chain (x̂_k, P_k)
{x^(i)_0}_{i=1..N}, w^(i) = 1/N # cloud, not Gaussian
for k = 1 to K:
# predict
for each i:
x^(i)_k ~ p(x_k | x^(i)_{k-1})
# sample from dynamics
# update
for each i:
w^(i) ∝ w^(i)_{k-1} · p(y_k | x^(i)_k)
normalize w^(i)
if ESS(w) < τ:
resample N particles ∝ w
w^(i) ← 1/N
return chain of clouds {x^(i)_k, w^(i)_k}
The largest of the three transitions: the belief representation changes (Gaussian sufficient statistics → weighted particle cloud), the predict step changes from deterministic sigma-point propagation to actual samples from the dynamics, the update step changes from a Kalman gain to a likelihood reweighting, and a resampling step is added to fight weight degeneracy. The shared structure is just the outer `for k` loop and the predict/update labels. This is what it takes to leave the Gaussian world.
6. Filter choice by model
The decision is mostly about how nonlinear the dynamics are and whether the posterior stays unimodal. Tags echo the section colors: closed-form means a Gaussian closed-form posterior (no sampling), smc means weighted particles updated sequentially.
| Method | Use when | Strengths | Pitfalls |
|---|---|---|---|
| Kalman filter closed-form | Linear-Gaussian state-space model. Filtering / smoothing of $x_{0:T}$ given $y_{1:T}$. | Exact; closed-form $O(d^3)$ per step. | Restricted to linear dynamics and Gaussian noise. |
| EKF closed-form | Mildly nonlinear state-space; tracking, robotics, navigation. | Drop-in replacement for KF; cheap. | First-order linearization error; can diverge under strong nonlinearity. |
| UKF closed-form | More nonlinear state-space than EKF can handle but still unimodal Gaussian. | Second-order accurate; no Jacobians required. | Still a Gaussian approximation; fails for multimodal / heavy-tailed posteriors. |
| Particle filter (SIR) smc | Nonlinear, non-Gaussian state-space; tracking, robotics, gene networks, anything multimodal over time. | Asymptotically exact for fully general SSMs; handles multimodal posteriors. | Sample impoverishment after resampling; scales poorly with state dimension. |
- Linear dynamics and Gaussian noise? → Kalman filter.
- Mildly nonlinear, Jacobians cheap and well-behaved? → EKF.
- More nonlinear but still unimodal Gaussian belief is OK? → UKF.
- Genuinely nonlinear / non-Gaussian / multimodal posterior? → particle filter.
- Need to do inference on a static posterior (no time dimension)? → see Monte Carlo & MCMC.
What next
Filtering connects to static-posterior sampling, dependency structure, and the linear systems story.