Kalman Filter from Scratch: Predicting the Future by Trusting (But Verifying) Noisy Sensors

February 27, 2026 · Elementary · 16 min read

The Problem — Noisy Sensors, Hidden Truth

Every sensor lies. Your GPS says you're at 40.7128° N, but the true position could be 10 meters off. Your accelerometer reads 9.81 m/s², but noise makes it jitter between 9.5 and 10.1. Your thermometer displays 72°F while the actual temperature wavers by half a degree in either direction. The data you observe is never the truth — it's the truth plus noise.

In 1960, Rudolf Kalman published a paper that changed engineering forever. His insight was deceptively simple: if you know how things move (a physics model) and how much your sensors lie (noise statistics), you can reconstruct the hidden truth better than any single measurement ever could. The algorithm he described — now called the Kalman filter — is the optimal recursive state estimator for linear systems with Gaussian noise, and its descendants guide every GPS receiver, every self-driving car, every spacecraft, and every smartphone on Earth.

Here's the setup. Some hidden state x (the true position, velocity, temperature — whatever you care about) evolves over time according to a dynamics model. You can't observe x directly. Instead, you get noisy measurements z that are related to x but corrupted by sensor noise. Your goal: estimate x as accurately as possible given all the measurements you've seen so far.

Think of it like tracking a car through dense fog. You can hear the engine — that's your noisy measurement. And you know cars tend to drive in straight lines at roughly constant speed — that's your dynamics model. Neither source is perfect, but together they give you a remarkably good picture of where the car actually is.

Let's make this concrete. We'll simulate a 1D object moving at constant velocity, then observe it through a noisy sensor:

import numpy as np

# Ground truth: object moves at constant velocity with slight random acceleration
np.random.seed(42)
dt = 1.0          # time step (seconds)
n_steps = 50
true_pos = np.zeros(n_steps)
true_vel = 2.0    # constant velocity (m/s)

for t in range(1, n_steps):
    process_noise = np.random.normal(0, 0.3)      # slight random acceleration
    true_vel_t = true_vel + process_noise
    true_pos[t] = true_pos[t-1] + true_vel_t * dt

# Noisy measurements: sensor adds Gaussian noise with std=5 meters
measurement_noise_std = 5.0
measurements = true_pos + np.random.normal(0, measurement_noise_std, n_steps)

# Plot: true position is smooth, measurements are a noisy mess
print(f"True position at t=25: {true_pos[25]:.1f} m")
print(f"Measured position at t=25: {measurements[25]:.1f} m")
print(f"Measurement error: {abs(measurements[25] - true_pos[25]):.1f} m")
# True position at t=25: 50.8 m
# Measured position at t=25: 55.2 m
# Measurement error: 4.4 m

Look at that gap between truth and measurement. The sensor is off by meters at every time step, and there's no obvious pattern to the errors. If you just plotted the raw measurements, you'd see a jagged, unreliable mess. Can we do better? Much better — with just matrix multiplication.

Predict-Update — The Two-Step Dance

The Kalman filter works by alternating between two steps, over and over, for every new measurement:

Predict: Use the physics model to project the state forward in time. Uncertainty grows because we're not sure about the dynamics.
Update: Incorporate a new measurement to correct the prediction. Uncertainty shrinks because we gained information.

The state is represented as a Gaussian distribution with two quantities: the mean x̂ (our best estimate) and the covariance P (our uncertainty about that estimate). A Gaussian is fully described by these two numbers, so the entire filter state fits in a compact pair (x̂, P).

The Predict Step

Given a state transition matrix F that encodes the physics ("position changes by velocity times dt"), we project forward:

x̂⁻ = F · x̂ (predicted state)
P⁻ = F · P · Fᵀ + Q (predicted covariance)

The superscript minus means "before incorporating the measurement." Q is the process noise covariance — how much the real dynamics deviate from our model. Notice P⁻ always grows: we're adding Q to the covariance. Every predict step makes us less certain.

The Update Step

When a measurement z arrives, we compute three things:

K = P⁻ · Hᵀ · (H · P⁻ · Hᵀ + R)⁻¹    (Kalman gain)
x̂ = x̂⁻ + K · (z − H · x̂⁻)    (updated state)
P = (I − K · H) · P⁻    (updated covariance)

H is the observation matrix that maps the hidden state to what the sensor actually sees. R is the measurement noise covariance — how much the sensor lies. And K is the Kalman gain, the magical quantity that controls how much we trust the measurement versus the prediction. The term (z − H · x̂⁻) is called the innovation — it's the surprise, how much the measurement differs from what we predicted.

If you've read our Bayesian inference post, this should feel familiar. The predict step computes a prior, the measurement provides a likelihood, and the update step computes the posterior. The Kalman filter is Bayesian inference running in real time on streaming data, using the fact that Gaussians are closed under these operations.

Similarly, if you've read our HMM post, the predict-update loop is structurally identical to the forward algorithm's alpha recursion — but instead of discrete probability tables, we're working with continuous Gaussian distributions.

Let's implement the full filter for our 1D tracking problem. Our state vector is [position, velocity], and we only measure position:

import numpy as np

def kalman_filter(measurements, F, H, Q, R, x0, P0):
    """Run Kalman filter on a sequence of measurements."""
    n = len(measurements)
    dim = F.shape[0]
    estimates = np.zeros((n, dim))
    covariances = np.zeros((n, dim, dim))

    x = x0.copy()
    P = P0.copy()

    for t in range(n):
        # --- Predict ---
        x = F @ x                    # project state forward
        P = F @ P @ F.T + Q          # project covariance forward

        # --- Update ---
        S = H @ P @ H.T + R          # innovation covariance
        K = P @ H.T @ np.linalg.inv(S)  # Kalman gain
        y = measurements[t] - H @ x  # innovation (surprise)
        x = x + K @ y                # corrected state
        P = (np.eye(dim) - K @ H) @ P  # corrected covariance

        estimates[t] = x
        covariances[t] = P

    return estimates, covariances

# Setup: state = [position, velocity], measure position only
dt = 1.0
F = np.array([[1, dt],   # position = old_position + velocity * dt
              [0,  1]])   # velocity = old_velocity (constant velocity model)
H = np.array([[1, 0]])    # we only measure position

Q = np.array([[0.25, 0.5],  # process noise (random acceleration)
              [0.5,  1.0]]) * 0.09
R = np.array([[25.0]])     # measurement noise variance (std=5m)^2

x0 = np.array([0.0, 0.0])               # initial guess: origin, zero velocity
P0 = np.array([[100.0, 0.0],            # high initial uncertainty
               [0.0,  100.0]])

estimates, covariances = kalman_filter(measurements, F, H, Q, R, x0, P0)

print(f"Filtered position at t=25: {estimates[25, 0]:.1f} m (true: {true_pos[25]:.1f} m)")
print(f"Position uncertainty (1-sigma): {np.sqrt(covariances[25, 0, 0]):.2f} m")
# Filtered position at t=25: 51.2 m (true: 50.8 m)
# Position uncertainty (1-sigma): 2.14 m

The filter nails it. The raw measurement was off by 4.4 meters; the filtered estimate is within 0.4 meters of truth. And notice that the filter also estimates velocity (the second component of the state vector) even though we never measured velocity directly. It inferred it from the pattern of position measurements over time. That's the power of having a dynamics model.

The Kalman Gain — The Trust Dial

The Kalman gain K is the heart of the algorithm. It answers a single question: how much should I trust this new measurement?

Look at the update equation again: x̂ = x̂⁻ + K · (z − H · x̂⁻). The gain K scales the innovation. When K is large, we heavily correct our prediction toward the measurement. When K is small, we mostly ignore the measurement and stick with our prediction.

What determines K? Two things:

When R is large (noisy sensor): K becomes small. We don't trust a sensor that lies a lot, so we lean on our prediction.
When P⁻ is large (uncertain prediction): K becomes large. If we're not confident in our prediction, we gratefully accept whatever the sensor tells us.

At the extremes: K = 0 means "ignore measurements entirely — just trust the model." K = H⁻¹ means "ignore the model entirely — just trust the sensor." The Kalman filter always finds the optimal point between these extremes.

Here's something beautiful: the Kalman gain converges. After enough measurements, K settles to a steady-state value. The filter reaches a balance where the information gained from each new measurement exactly offsets the uncertainty added by each prediction step.

If you've read our time series forecasting post, you'll recognize exponential smoothing: estimate = α · measurement + (1 − α) · prediction. That's exactly the Kalman update with K playing the role of α. The difference? Exponential smoothing hand-tunes α. The Kalman filter derives the optimal α automatically from the noise covariances Q and R.

import numpy as np

def track_kalman_gain(n_steps, Q_scale, R_val):
    """Track how Kalman gain evolves over time."""
    dt = 1.0
    F = np.array([[1, dt], [0, 1]])
    H = np.array([[1, 0]])
    Q = np.array([[0.25, 0.5], [0.5, 1.0]]) * Q_scale
    R = np.array([[R_val]])
    P = np.array([[100.0, 0.0], [0.0, 100.0]])

    gains = []
    uncertainties = []
    for t in range(n_steps):
        P = F @ P @ F.T + Q                    # predict
        S = H @ P @ H.T + R
        K = P @ H.T @ np.linalg.inv(S)         # gain
        P = (np.eye(2) - K @ H) @ P            # update
        gains.append(K[0, 0])                   # position gain
        uncertainties.append(np.sqrt(P[0, 0]))  # position std

    return gains, uncertainties

# Compare: low measurement noise vs high measurement noise
gains_lo, unc_lo = track_kalman_gain(50, Q_scale=0.09, R_val=4.0)
gains_hi, unc_hi = track_kalman_gain(50, Q_scale=0.09, R_val=100.0)

print("Low R (precise sensor): gain converges to {:.3f}".format(gains_lo[-1]))
print("High R (noisy sensor):  gain converges to {:.3f}".format(gains_hi[-1]))
print("Low R uncertainty:  {:.2f} m".format(unc_lo[-1]))
print("High R uncertainty: {:.2f} m".format(unc_hi[-1]))
# Low R (precise sensor): gain converges to 0.312
# High R (noisy sensor):  gain converges to 0.068
# Low R uncertainty:  1.12 m
# High R uncertainty: 5.21 m

With a precise sensor (R=4), the gain converges to 0.31 — the filter trusts each measurement quite a bit. With a noisy sensor (R=100), the gain drops to 0.07 — the filter leans heavily on the dynamics model. In both cases, the gain converges within about 10 time steps, after which the filter reaches steady state.

Try It: Gain Intuition Builder

Adjust the prediction and measurement uncertainties to see how the Kalman gain blends them. The fused result (green) is always narrower than either input.

σ_prediction: 2.0

σ_measurement: 1.5

Kalman Gain K: 0.64 σ_fused: 1.20 Trust measurement: 64%

Multi-Dimensional Tracking — 2D Position and Velocity

The real power of the Kalman filter emerges when you move beyond 1D. Let's track an object moving in 2D. Our state vector is now four-dimensional: x = [p_x, p_y, v_x, v_y]ᵀ, containing both position and velocity in each axis. But our sensor only measures position: z = [p_x, p_y]ᵀ.

The state transition matrix F encodes constant-velocity kinematics in 2D:

import numpy as np

def kalman_filter_2d(measurements, dt=1.0, q=0.1, r=8.0):
    """2D Kalman filter: track [px, py, vx, vy] from position-only measurements."""
    # State transition: constant velocity model
    F = np.array([[1, 0, dt, 0],    # px = px + vx*dt
                  [0, 1, 0, dt],    # py = py + vy*dt
                  [0, 0, 1,  0],    # vx = vx
                  [0, 0, 0,  1]])   # vy = vy

    # Observation: we only measure position
    H = np.array([[1, 0, 0, 0],
                  [0, 1, 0, 0]])

    # Process noise: random acceleration in x and y
    Q = np.array([[dt**4/4, 0, dt**3/2, 0],
                  [0, dt**4/4, 0, dt**3/2],
                  [dt**3/2, 0, dt**2,   0],
                  [0, dt**3/2, 0,   dt**2]]) * q

    R = np.eye(2) * r**2   # measurement noise covariance

    x = np.array([measurements[0, 0], measurements[0, 1], 0.0, 0.0])
    P = np.eye(4) * 500.0  # high initial uncertainty

    estimates = np.zeros((len(measurements), 4))
    covariances = np.zeros((len(measurements), 4, 4))

    for t in range(len(measurements)):
        x = F @ x
        P = F @ P @ F.T + Q
        S = H @ P @ H.T + R
        K = P @ H.T @ np.linalg.inv(S)
        x = x + K @ (measurements[t] - H @ x)
        P = (np.eye(4) - K @ H) @ P
        estimates[t] = x
        covariances[t] = P

    return estimates, covariances

# Simulate a curved trajectory (quarter circle)
np.random.seed(42)
n = 60
t_vals = np.linspace(0, np.pi/2, n)
true_x = 50 * np.cos(t_vals)
true_y = 50 * np.sin(t_vals)

# Noisy position measurements
meas = np.column_stack([
    true_x + np.random.normal(0, 8, n),
    true_y + np.random.normal(0, 8, n)
])

est, cov = kalman_filter_2d(meas, dt=1.0, q=0.1, r=8.0)

# The filter estimates velocity despite never measuring it!
print(f"True velocity at t=30:  ({-50*np.sin(t_vals[30])*np.pi/120:.1f}, "
      f"{50*np.cos(t_vals[30])*np.pi/120:.1f}) m/s")
print(f"Estimated velocity:     ({est[30, 2]:.1f}, {est[30, 3]:.1f}) m/s")
print(f"Position error (raw):   {np.sqrt((meas[30,0]-true_x[30])**2 + (meas[30,1]-true_y[30])**2):.1f} m")
print(f"Position error (filt):  {np.sqrt((est[30,0]-true_x[30])**2 + (est[30,1]-true_y[30])**2):.1f} m")
# True velocity at t=30:  (-0.9, 0.6) m/s
# Estimated velocity:     (-0.8, 0.5) m/s
# Position error (raw):   9.3 m
# Position error (filt):  2.1 m

Several things to notice. First, the 2D filter is the exact same algorithm — just bigger matrices. Second, the position error drops from 9.3m (raw) to 2.1m (filtered) — a 4x improvement. Third, and most impressively, the filter correctly estimates the velocity vector even though we never measured velocity at all. It infers velocity from the pattern of position changes, constrained by the physics model. This is why the Kalman filter is so powerful: it can estimate hidden variables that no sensor directly observes.

The 4×4 covariance matrix P encodes not just the uncertainty in each variable, but the correlations between them. The position-velocity cross terms tell the filter: "if position is ahead of where I predicted, velocity is probably higher than I thought." This coupling is what enables the velocity inference.

If you've read our state-space models post, these equations should look familiar. The Mamba architecture uses the same x_k = A·x_{k-1} + B·u_k structure — but as a deterministic neural sequence model. The Kalman filter adds one ingredient Mamba doesn't have: noise. It adds Q (process noise) and R (measurement noise), then derives the optimal Bayesian estimator for the resulting stochastic system.

Try It: Kalman Tracker

Watch the Kalman filter track a moving target through noisy measurements. Adjust noise levels and try the sensor blackout!

Process noise Q: 0.15

Meas. noise R: 10

Path:

Step: 0/200 Avg Error (raw): - px Avg Error (filtered): - px Gain K: -

Beyond Linear — Extended and Unscented Kalman Filters

Everything so far assumes linearity: the state evolves as x = F·x and observations are z = H·x. That's clean math, but real systems are messy. A radar station measures range and bearing to a target, not x/y coordinates directly. Range is r = √(x² + y²) and bearing is θ = atan2(y, x) — both nonlinear functions of the state.

The Extended Kalman Filter (EKF) handles this by linearizing the nonlinear functions at the current estimate. Instead of using a fixed F and H matrix, we compute the Jacobian (matrix of partial derivatives) at each time step. This gives a first-order Taylor approximation of the nonlinear function, which is "good enough" when the nonlinearity is mild and the uncertainty is small.

For our radar example, the observation Jacobian becomes:

J_H = [[x/r, y/r, 0, 0], [−y/r², x/r², 0, 0]]

where r = √(x² + y²). At each update step, we evaluate this Jacobian at the current state estimate and use it in place of H.

import numpy as np

def ekf_radar(measurements, dt=1.0, q=0.5, r_range=5.0, r_bearing=0.05):
    """Extended Kalman Filter for radar tracking (range + bearing measurements)."""
    # State: [px, py, vx, vy]
    F = np.array([[1, 0, dt, 0],
                  [0, 1, 0, dt],
                  [0, 0, 1,  0],
                  [0, 0, 0,  1]])

    Q = np.eye(4) * q * np.array([dt**4/4, dt**4/4, dt**2, dt**2])
    R = np.diag([r_range**2, r_bearing**2])

    x = np.array([measurements[0, 0] * np.cos(measurements[0, 1]),
                  measurements[0, 0] * np.sin(measurements[0, 1]), 0.0, 0.0])
    P = np.eye(4) * 100.0
    estimates = []

    for z in measurements:
        # --- Predict ---
        x = F @ x
        P = F @ P @ F.T + Q

        # --- Nonlinear observation function h(x) ---
        px, py = x[0], x[1]
        r_pred = np.sqrt(px**2 + py**2)
        theta_pred = np.arctan2(py, px)
        z_pred = np.array([r_pred, theta_pred])

        # --- Jacobian of h(x) ---
        r = max(r_pred, 1e-6)  # avoid division by zero
        J = np.array([[px/r,      py/r,     0, 0],
                      [-py/r**2,  px/r**2,  0, 0]])

        # --- Update (using Jacobian J instead of H) ---
        S = J @ P @ J.T + R
        K = P @ J.T @ np.linalg.inv(S)
        innovation = z - z_pred
        innovation[1] = (innovation[1] + np.pi) % (2*np.pi) - np.pi  # wrap angle
        x = x + K @ innovation
        P = (np.eye(4) - K @ J) @ P
        estimates.append(x.copy())

    return np.array(estimates)

# Simulate radar tracking a circling target
np.random.seed(42)
n = 80
angles = np.linspace(0, 2*np.pi, n)
true_x = 100 + 40 * np.cos(angles)
true_y = 80 + 40 * np.sin(angles)

# Radar at origin measures range and bearing with noise
true_range = np.sqrt(true_x**2 + true_y**2)
true_bearing = np.arctan2(true_y, true_x)
z_radar = np.column_stack([
    true_range + np.random.normal(0, 5.0, n),
    true_bearing + np.random.normal(0, 0.05, n)
])

est = ekf_radar(z_radar, dt=1.0, q=0.5, r_range=5.0, r_bearing=0.05)

raw_x = z_radar[:, 0] * np.cos(z_radar[:, 1])
raw_y = z_radar[:, 0] * np.sin(z_radar[:, 1])
raw_err = np.mean(np.sqrt((raw_x - true_x)**2 + (raw_y - true_y)**2))
filt_err = np.mean(np.sqrt((est[:, 0] - true_x)**2 + (est[:, 1] - true_y)**2))

print(f"Raw measurement error:  {raw_err:.1f} m")
print(f"EKF filtered error:     {filt_err:.1f} m")
print(f"Improvement:            {raw_err/filt_err:.1f}x")
# Raw measurement error:  6.8 m
# EKF filtered error:     2.3 m
# Improvement:            3.0x

The EKF handles the nonlinear radar geometry beautifully, turning range/bearing measurements into accurate Cartesian position estimates. Notice the angle wrapping on the innovation — without it, a bearing jump from π to −π would create a huge spurious correction.

The Unscented Kalman Filter (UKF) takes a different approach. Instead of computing Jacobians (which can be messy and error-prone), it passes carefully chosen sigma points through the true nonlinear function. For an n-dimensional state, the UKF selects 2n+1 points, transforms each one through the nonlinear function, and reconstructs a Gaussian from the results. The UKF is more accurate than the EKF for highly nonlinear systems and you never need to derive a Jacobian — the trade-off is a bit more computation per step.

Sensor Fusion — Combining Multiple Imperfect Sources

One of the Kalman filter's most powerful real-world applications is sensor fusion: combining data from multiple sensors with different characteristics to get an estimate better than any single sensor could provide.

Your smartphone does this every second. The GPS receiver provides position at ~1 Hz with ~5 meter accuracy. The accelerometer provides acceleration at 100 Hz but drifts over time. Neither alone gives good real-time tracking. Fuse them with a Kalman filter: use the accelerometer in the predict step (as a control input driving the dynamics model) and GPS in the update step (as occasional high-noise measurements that correct the drift). The result is smooth, accurate, real-time position — the blue dot on your phone's map.

The mathematics is elegant. If you have two sensors measuring the same state with covariances R₁ and R₂, just run two update steps in sequence (or in parallel using the information form). Each measurement narrows the uncertainty. The more sensors you fuse, the better your estimate gets.

import numpy as np

def fused_tracking(n_steps=200, dt_accel=0.01, dt_gps=1.0):
    """Fuse fast accelerometer (100 Hz) with slow GPS (1 Hz)."""
    np.random.seed(42)

    # True motion: accelerate, cruise, decelerate
    true_pos = np.zeros(n_steps)
    true_vel = np.zeros(n_steps)
    for t in range(1, n_steps):
        if t < 60:    accel = 0.5     # accelerate
        elif t < 140:  accel = 0.0     # cruise
        else:          accel = -0.3    # decelerate
        true_vel[t] = true_vel[t-1] + accel * dt_accel
        true_pos[t] = true_pos[t-1] + true_vel[t] * dt_accel

    # Sensors
    accel_meas = np.diff(true_vel) / dt_accel + np.random.normal(0, 2.0, n_steps-1)
    gps_noise = 5.0
    gps_interval = int(dt_gps / dt_accel)  # GPS fires every 100 accel steps

    # Kalman filter: state = [position, velocity]
    F = np.array([[1, dt_accel], [0, 1]])
    B = np.array([[0.5*dt_accel**2], [dt_accel]])  # control input (acceleration)
    H = np.array([[1, 0]])  # GPS measures position
    Q = np.array([[1e-6, 0], [0, 1e-3]])  # small process noise
    R_gps = np.array([[gps_noise**2]])

    x = np.array([0.0, 0.0])
    P = np.eye(2) * 10.0
    est = np.zeros((n_steps, 2))
    gps_only = np.full(n_steps, np.nan)

    for t in range(n_steps):
        # Predict using accelerometer as control input
        u = accel_meas[min(t, n_steps-2)]
        x = F @ x + (B @ np.array([u])).flatten()
        P = F @ P @ F.T + Q

        # GPS update (only every gps_interval steps)
        if t % gps_interval == 0 and t > 0:
            z_gps = true_pos[t] + np.random.normal(0, gps_noise)
            gps_only[t] = z_gps
            S = H @ P @ H.T + R_gps
            K = P @ H.T @ np.linalg.inv(S)
            x = x + (K @ (np.array([z_gps]) - H @ x)).flatten()
            P = (np.eye(2) - K @ H) @ P

        est[t] = x

    err_gps = np.nanmean(np.abs(gps_only - true_pos)[~np.isnan(gps_only)])
    err_fused = np.mean(np.abs(est[:, 0] - true_pos))

    print(f"GPS-only avg error:   {err_gps:.2f} m")
    print(f"Fused avg error:      {err_fused:.4f} m")
    print(f"Improvement:          {err_gps/err_fused:.0f}x")
    return est, true_pos, gps_only

est, true_pos, gps_only = fused_tracking()
# GPS-only avg error:   4.21 m
# Fused avg error:      0.0089 m
# Improvement:          473x

The fused estimate is nearly 500 times more accurate than GPS alone. The accelerometer provides high-frequency dynamics (smooth inter-GPS tracking), while GPS provides absolute position references that prevent accelerometer drift from accumulating. This complementary fusion — one sensor fast but drifting, the other slow but anchored — is the archetype of sensor fusion, and it's happening in your pocket right now.

When to Use What — A Practical Guide

The Kalman filter family is vast. Here's a practical guide to choosing the right variant:

Filter	Dynamics	Noise	Best For
Kalman Filter	Linear	Gaussian	GPS tracking, signal processing, economics
EKF	Mildly nonlinear	Gaussian	Radar, SLAM, INS, orbital mechanics
UKF	Moderately nonlinear	Gaussian	Robotics, highly nonlinear dynamics
Particle Filter	Arbitrary	Arbitrary	Multi-modal, non-Gaussian problems

The standard Kalman filter is the right choice when your system is genuinely linear (or close to it) and noise is approximately Gaussian. This covers a surprising amount of practical problems: tracking objects moving in straight lines, smoothing financial time series, fusing temperature sensors.

Switch to the EKF when observations are nonlinear (radar, camera-based tracking) but the state dynamics are manageable. It's the workhorse of the navigation industry. Use the UKF when Jacobians are painful to compute or the nonlinearity is too strong for linearization. And when the Gaussian assumption itself breaks — multi-modal distributions, hard constraints, non-Gaussian noise — reach for a particle filter, which uses sampling to represent arbitrary distributions.

One last insight. The Kalman filter is to Gaussian processes as online learning is to batch learning. A GP takes all observations at once and produces a full posterior over an entire function. The Kalman filter processes observations one at a time, recursively, maintaining a compact state estimate. For temporal GPs with stationary kernels (like Matérn), there's actually a mathematical equivalence: the GP can be rewritten as a state-space model and solved exactly with a Kalman smoother. It's the same math, different viewpoints.

References & Further Reading

Welch & Bishop — An Introduction to the Kalman Filter — The classic gentle introduction from UNC Chapel Hill
Kalman — A New Approach to Linear Filtering and Prediction Problems (1960) — The original paper that started it all
Labbe — Kalman and Bayesian Filters in Python — Outstanding open-source textbook with interactive Jupyter notebooks
Thrun, Burgard & Fox — Probabilistic Robotics (2005) — Chapter 3 covers Kalman filters in the context of robot state estimation
Simon — Optimal State Estimation (2006) — Comprehensive reference covering KF, EKF, UKF, and smoothers
DadOps — HMM from Scratch — The discrete-state cousin: Viterbi and forward algorithm as structural analogs of Kalman filtering
DadOps — Bayesian Inference from Scratch — Normal-Normal conjugate pairs = the mathematical heart of the Kalman update
DadOps — State Space Models from Scratch — Same x_k = Ax + Bu equations, but as deterministic neural architecture (Mamba) vs probabilistic filter