notes.osteele.com / math

Math Notes

Study notes on probability, information theory, random processes, and Bayesian inference.

Probability & Statistics #

Foundations, estimation, dependence, and convergence.

Selective study notes; not externally reviewed. Some derivations and implementations still need checking.

10 items
Measure Theory & Random Variables
Measurable spaces, probability measures, pushforward measures, densities, and importance sampling
Notation: density form vs. measure-theoretic form
Density notation, measure-theoretic notation, and the places where the choice changes the statement
Named Distributions
How Bernoulli, Poisson, Gaussian, Cauchy, chi-square, t, F, conjugate priors, and heavy-tail laws are related
Modes of Convergence
Almost sure, in probability, in distribution, in L^p — the implication lattice, counterexamples as sample paths, Markov/Chebyshev/Chernoff bounds, and MCT/DCT/Fatou
Calculus of Variations
First variations, Euler-Lagrange residuals, curve relaxation, and the brachistochrone race
Sufficient Statistics
Fisher–Neyman factorization, the fiber picture, Rao–Blackwell variance collapse, and a categorical diagram showing factorization, variance, and Fisher info as one commuting square
The Exponential Family
The canonical form, naming and the statistical-physics log-partition story, derivatives of A giving the moments of T(X), canonical links (logit, log) behind GLMs, and an interactive picker stepping through six standard members
Fisher Information
Likelihood geometry, score functions, Fisher information, exponential families, log-partition, Jeffreys and max-entropy priors, and Bayesian updates
Hypothesis Testing
Type-I error, type-II error, power, and decision thresholds through the classic overlapping-distributions diagram
Distance Correlation
Distance-based dependence tests, partial distance correlation, and cases Pearson r misses
Paper notes
ϕ

Bayesian Inference #

Approximating intractable posteriors by sampling and by optimization.

Material draws on Prof. Ercan Kuruoğlu's Spring 2026 Bayesian Inference and Monte Carlo Simulation course at Tsinghua SIGS.

Selective study notes; not externally reviewed. Some use measure-theoretic notation alongside density notation. Some derivations and implementations still need checking.

12 items
Choosing a Prior
Principles of prior selection: use real prior information when you have it; otherwise group invariance, max entropy, or Jeffreys — and how the three routes disagree near boundaries
Conjugate Priors & the Exponential Family
Why some prior–likelihood pairs update in closed form, hyperparameters as pseudo-counts, worked Beta/Normal/Gamma examples, and a table of standard pairs
Posterior Summaries & Bayes Risk
Squared, absolute, and zero-one loss pick out the posterior mean, median, and mode — three views of the same posterior, only one of which ignores everything but the peak
Hierarchical Bayes
Two-level Normal–Normal model, the posterior formula for borrowing strength across groups, empirical-Bayes fitting of the between-group variance, and the connection to ridge regression
Bayesian Regression: Penalties as Priors
OLS, ridge, LASSO, and best-subset selection as MAP under four noise/prior pairs — and why the shape of the prior near zero determines whether the estimator shrinks, selects, or both
Bayesian Graphical Models
DAG factorization, d-separation, explaining away, Dirichlet-multinomial CPT learning, and structure scoring
Hidden Markov Models
HMM sampling, forward-backward filtering and smoothing, log-domain messages, and Viterbi versus marginal MAP paths
Monte Carlo & MCMC
Rejection, importance sampling, Metropolis-Hastings, Gibbs, RJMCMC, simulated annealing, and when to use each method on a static target
Kalman & Particle Filters
Sequential inference of a hidden state from noisy observations: Kalman filter for linear-Gaussian models, EKF/UKF for local linearization, particle filter for fully nonlinear non-Gaussian SSMs
Free Energy & Variational Inference
The free-energy/ELBO identity and how it turns posterior approximation into optimization
Variational Bayes for Gaussian Mixtures
CAVI for a 2-D Gaussian mixture with Normal–Wishart and Dirichlet priors, showing component ellipses, automatic pruning of unused components, and the ELBO trace
Bayesian Neural Networks
Weight posteriors, predictive function ensembles, Laplace approximation, evidence, Occam's hill, and prior mismatch