Blog
Elementary AI
Building understanding from the ground up
Imitation Learning from Scratch
Every RL algorithm needs a reward function — but what if you just have a human showing you what to do? Build the complete imitation learning pipeline from scratch: behavioral cloning as supervised learning on expert demonstrations, the devastating distributional shift problem where O(T²) errors compound quadratically, DAgger's elegant fix of querying the expert on the learner's own states, feature-matching inverse RL that recovers hidden reward functions from behavior, and Maximum Entropy IRL that resolves reward ambiguity via the Boltzmann distribution. Two interactive demos let you watch BC drift off-path while DAgger self-corrects, and place expert waypoints to recover reward heatmaps in real time.
Read MoreHopfield Networks from Scratch
Your brain doesn't search for memories — it completes them. Build the oldest and most physically-grounded neural network from scratch: binary Hopfield networks with Hebbian learning, energy landscapes with convergence proofs, the 0.138N capacity wall from statistical mechanics, continuous Hopfield networks with exponential storage, and the stunning result that softmax attention IS a Hopfield update — making every Transformer a deep associative memory. Two demos let you draw and recall patterns and watch Hopfield and attention produce identical outputs live.
Read MoreSurvival Analysis from Scratch
Every ML model you've built assumes you see the outcome — but what happens when patients drop out, customers haven't churned yet, or machines are still running? Build the complete survival analysis toolkit from scratch: Kaplan-Meier estimators with Greenwood confidence bands, the log-rank test, Cox Proportional Hazards with partial likelihood and Breslow baseline estimation, concordance index evaluation, and Weibull MLE with censoring. Two interactive demos let you explore survival curves with adjustable censoring and watch Cox PH coefficients reshape hazard in real time.
Read MoreProgram Synthesis from Scratch
Every Copilot suggestion is a program synthesizer at work. Build the entire progression from scratch: brute-force enumeration over AST grammars with observational equivalence pruning, CEGIS counterexample-guided refinement (the engine behind Excel's FlashFill), neural-guided search with a hand-trained MLP, and the LLM self-repair loop that powers modern coding assistants. Two interactive demos let you synthesize programs from I/O examples and watch CEGIS converge in real time.
Read MoreApproximate Nearest Neighbors from Scratch
You have 50 million embeddings and 20ms to find the 10 most similar — brute force is physically impossible. Build locality-sensitive hashing, navigable small-world graphs (HNSW), and product quantization from first principles in NumPy, then race them head-to-head in two interactive demos that make the recall-latency-memory trilemma visceral.
Read MoreNeuroevolution from Scratch
What if you could train neural networks the way nature trains organisms — through mutation, selection, and survival of the fittest? Evolve network weights with a simple GA, grow topology and weights simultaneously with NEAT's innovation numbers and speciation, compress millions of weights through HyperNEAT's CPPNs, and scale to industrial problems with evolution strategies. Two interactive demos let you watch a network evolve to solve XOR and race ES against gradient descent on a multi-modal landscape.
Read MoreDifferentiable Programming from Scratch
Backpropagation lets you optimize neural networks. Differentiable programming lets you optimize anything — sorting algorithms, physics simulators, 3D renderers, even database queries. Build two complete autodiff engines (forward-mode with dual numbers and reverse-mode with computational graphs), learn to differentiate through non-differentiable operations using the Straight-Through Estimator and soft relaxations, apply the Implicit Function Theorem to differentiate through solver solutions, push gradients through a physics simulator, and survey how JAX, PyTorch, and Zygote make it all composable.
Read MoreWorld Models from Scratch
Model-free RL agents need millions of interactions to learn simple tasks. Humans just imagine the outcome. World Models bring this ability to artificial agents — a VAE compresses observations, an MDN-RNN predicts future states, and a tiny linear controller learns to act entirely inside learned "dreams." Build the full V-M-C pipeline from scratch, train a controller that never touches the real environment, and explore two interactive demos where you watch an agent learn to dream and plan.
Read MoreNeural Processes from Scratch
Gaussian Processes give beautiful uncertainty but choke at O(n³). Neural networks scale effortlessly but give point predictions with no honesty about what they don't know. Neural Processes combine both — learning to map context points to full predictive distributions in a single forward pass. Build the CNP, Latent NP, and Attentive NP from scratch, train with episodic meta-learning, and explore two interactive demos where you click to add context and watch uncertainty shrink in real time.
Read MoreSpiking Neural Networks from Scratch
Your brain runs on 20 watts. GPT-4 training used 50 GWh. The secret? Biological neurons communicate with precisely timed electrical spikes, not floating-point numbers. Build the LIF neuron model, encode data as spike trains, learn with STDP, and train deep SNNs with surrogate gradients. Two demos let you inject current into a spiking neuron and watch STDP reshape a synapse in real time.
Read MoreDomain Adaptation from Scratch
Your model hits 97% accuracy on test data — then drops to 61% in production. The data changed, not the model. Build every major domain adaptation technique from Ben-David's theoretical bound through MMD, CORAL, and DANN with the gradient reversal layer. Two interactive demos let you watch CORAL recover accuracy on shifted data and see adversarial training force domain-invariant features in real time.
Read MoreMulti-Task Learning from Scratch
What if solving more problems made your model better at each one? Build multi-task networks with hard parameter sharing, then tackle the real challenges: uncertainty weighting to balance loss scales, PCGrad to surgically remove gradient conflicts, and GradNorm to equalize training rates. Two interactive demos let you compare shared vs separate networks and visualize gradient surgery in a 2D loss landscape.
Read MoreSymbolic Regression from Scratch
Neural networks approximate functions — but what if your model could hand you the actual equation? Build a full genetic programming engine from scratch: expression trees, tournament selection, subtree crossover, parsimony pressure, and linear scaling. Two interactive demos let you evolve expressions in real time and rediscover physics laws from noisy data with Pareto front visualization.
Read MoreCurriculum Learning from Scratch: Teaching Neural Networks the Way Humans Learn
You learned to add before calculus — why do we train neural networks on shuffled data? Curriculum learning presents easy examples first, and there's deep theory behind it: continuation methods that smooth the loss landscape. Build self-paced learning from scratch, explore focal loss as anti-curriculum, implement forgetting events for data pruning, and watch two interactive demos compare curriculum vs random training in real time.
Read MoreMixture Density Networks from Scratch: When Neural Networks Output Probability Distributions Instead of Point Predictions
Standard neural networks predict the average — catastrophically wrong when multiple answers are valid. MDNs output Gaussian mixture parameters instead: means, variances, and mixing weights that describe the full conditional distribution. Build Bishop's 1994 architecture from scratch with the log-sum-exp trick, train on inverse problems, and explore two interactive demos — click to query conditional densities and watch a real-time training heatmap.
Read MoreKolmogorov-Arnold Networks from Scratch: Learnable Activations, B-Splines, and the End of Fixed Neurons
MLPs put fixed activations on neurons and learn the weights. KANs flip this — learnable B-spline activations live on edges, nodes just sum. Build KANs from the 1957 Kolmogorov-Arnold theorem through B-spline basis functions, residual SiLU edges, grid refinement training, and symbolic regression. Two interactive demos let you watch splines evolve during training and pit a KAN against an MLP on function approximation.
Read MoreByte-Level Models from Scratch: UTF-8 Encodings, ByT5, and the End of Tokenization
Type "hello" — 5 bytes, 1 token. Type "สวัสดี" — 18 bytes, 6 tokens. Same greeting, 6x the cost. Tokenization isn't a solved problem — it's a bottleneck. Build byte-level models from UTF-8 encoding through ByT5's pooling architecture, MegaByte's local-global patching, and MambaByte's linear-time scanning. Two interactive demos let you measure the tokenization tax across languages and watch three architectures scale from 64 to 2048 bytes.
Read MoreGeometric Deep Learning from Scratch: Symmetry, Groups, and the Blueprint That Unifies CNNs, GNNs, and Transformers
CNNs, GNNs, and Transformers aren't three separate inventions — they're three consequences of one principle: respect the symmetry. Derive convolution from translation equivariance, message passing from permutation equivariance, and attention from set equivariance. Then extend to rotation-equivariant group convolutions and the five-component blueprint that generates any architecture from its symmetry group. Two interactive demos let you verify equivariance and explore the unified framework.
Read MoreNeural ODEs from Scratch: When Depth Becomes Continuous and Networks Learn to Flow
What happens when you let a ResNet have infinitely many layers? You get a Neural ODE — a network whose forward pass solves a differential equation. Build ODE solvers from scratch, implement the adjoint method for O(1)-memory training, derive continuous normalizing flows with the Hutchinson trace estimator, and see why augmented Neural ODEs break the homeomorphism barrier. Two interactive demos let you watch continuous-depth classification and irregular time series modeling in action.
Read MoreEnergy-Based Models from Scratch: Boltzmann Machines, Contrastive Divergence, and Score Matching
Every probability distribution is an energy landscape — low energy means high probability. Build the framework that unifies generative AI: Hopfield networks as associative memory, RBMs with contrastive divergence, MCMC sampling via Langevin dynamics, and score matching that connects directly to modern diffusion models. Two interactive demos let you watch energy minimization in action.
Read MoreVideo Understanding from Scratch: Optical Flow, 3D Convolutions, and Video Transformers
A video of throwing and a video of catching contain identical frames in different order — temporal ordering is the signal. Build video understanding from first principles: frame differences, Lucas-Kanade optical flow, 3D convolutions (C3D, I3D, R(2+1)D), two-stream networks, divided space-time attention (TimeSformer, ViViT), and modern VideoMAE pretraining. Two interactive demos visualize flow fields and attention patterns.
Read MoreText-to-Speech from Scratch: Teaching Machines to Speak with Mel Spectrograms and Neural Vocoders
From phonemes to waveforms — build the complete TTS pipeline: text normalization, duration prediction, Tacotron-style mel generation, Griffin-Lim phase estimation, and neural vocoders like WaveNet and HiFi-GAN. Two interactive demos let you drag phoneme durations and watch Griffin-Lim iteratively reconstruct audio from magnitude spectrograms.
Read MoreNeural Architecture Search from Scratch: Teaching Machines to Design Neural Networks
You hand-design a neural network and hope it's good enough. But what if a machine could search over billions of possible architectures automatically? Build NAS from first principles: cell-based search spaces, random search with successive halving, evolutionary architecture search, differentiable DARTS with bilevel optimization, weight-sharing supernets, and hardware-aware multi-objective search — with interactive demos that let you explore architecture DAGs and watch three search strategies race across a fitness landscape.
Read MoreSpeech Recognition from Scratch: From Sound Waves to Text with CTC and Attention
You say "Hey Siri, set a timer" and 0.8 seconds of air pressure waves become text. Build the complete ASR pipeline from first principles: mel spectrograms, the CTC forward algorithm for alignment-free training, greedy and beam search decoding with prefix merging, attention-based encoder-decoders, Word Error Rate evaluation, and modern architectures like Whisper and Conformer — with interactive demos that let you explore the CTC trellis and watch beam search outperform greedy decoding in real time.
Read MoreKnowledge Graphs from Scratch
Google "who directed Inception" and a Knowledge Panel appears instantly — that's a knowledge graph with billions of (entity, relation, entity) triples. Build the complete KG toolkit from first principles: graph construction, TransE translation embeddings, DistMult and ComplEx bilinear models, link prediction evaluation, multi-hop reasoning, and the convergence of KGs with LLMs — with interactive demos that let you explore a live knowledge graph and watch embedding geometry reshape under training.
Read MoreDense Retrieval from Scratch
Search for "leaking faucet" and the best result says "stopping drips from fixtures" — zero keyword overlap. Build the entire dense retrieval stack from first principles: BM25 baselines, bi-encoder architecture, InfoNCE contrastive training, hard negative mining, product quantization for billion-scale search, and ColBERT's MaxSim late interaction — with interactive demos that let you explore embedding space and watch contrastive learning reshape it.
Read MoreFederated Learning from Scratch: Training Models Without Sharing Data
Five hospitals each hold 10,000 MRI scans. HIPAA says they can't pool them. Federated learning says they don't have to. Build the complete FL toolkit from first principles: FedAvg, communication-efficient sparsification with error feedback, FedProx for non-IID data, secure aggregation via pairwise masking, and DP-FedAvg for formal privacy guarantees — with interactive demos that let you run federated training across virtual clients and explore how label skew affects convergence.
Read MoreContinual Learning from Scratch
Train a neural network on cats vs dogs — 95% accuracy. Now train it on cars vs trucks — 93%. Test it on cats vs dogs again: 52%. This is catastrophic forgetting. Build the complete continual learning toolkit from first principles: EWC, experience replay with reservoir sampling, PackNet, Learning without Forgetting, and evaluation metrics — with interactive demos that let you watch forgetting happen in real time and see how each defense preserves knowledge.
Read MoreDifferential Privacy from Scratch
Anonymization is fundamentally broken — Netflix, AOL, and hospital records have all been re-identified. Build the complete differential privacy toolkit from first principles: randomized response, Laplace and Gaussian mechanisms, composition theorems, DP-SGD for private deep learning, and the exponential mechanism — with interactive demos that let you spend a privacy budget on real queries and watch DP-SGD train in real time.
Read MoreAudio Features from Scratch: From Sound Waves to Spectrograms, MFCCs, and Neural Audio
Every song and every spoken word is just a list of integers. Build the complete audio feature pipeline from scratch — Fourier transforms, spectrograms, mel filterbanks, MFCCs — with interactive demos that let you hear signals, visualize the time-frequency tradeoff, and watch the MFCC extraction pipeline step by step.
Read MoreSemantic Segmentation from Scratch: Classifying Every Pixel in an Image
Object detection draws rectangles. Semantic segmentation labels every single pixel. Build the complete segmentation pipeline from scratch — FCN, U-Net skip connections, dilated convolutions, Dice loss — with interactive demos comparing FCN's blurry predictions to U-Net's sharp boundaries and watching Dice loss outperform cross-entropy on imbalanced data.
Read MoreObject Detection from Scratch: Finding and Labeling Every Object in an Image
Image classification says "cat." Object detection says "cat at (120, 45, 280, 190)." Build the complete detection pipeline from scratch — IoU, anchor boxes, NMS, YOLO's single-shot grid, anchor-free FCOS, and focal loss — with interactive demos for dragging bounding boxes and watching a detector learn in real time.
Read MoreData Augmentation from Scratch: Training Better Models with the Data You Already Have
Deep learning is hungry for data, but what if you could train better models without collecting more? Build the complete data augmentation toolkit from first principles — geometric transforms, color jitter, Mixup, CutMix, RandAugment, and text augmentation — with interactive demos showing how augmentation tames overfitting.
Read MoreActive Learning from Scratch: Teaching Your Model to Ask the Right Questions
You have 100,000 unlabeled examples and a budget for 500 labels — which 500 should you pick? Build active learning from first principles: implement uncertainty sampling, query-by-committee, expected gradient length, and batch diversity strategies, then explore failure modes and modern LLM applications — with interactive demos pitting an active learner against random selection.
Read MoreImplicit Bias of Gradient Descent from Scratch: Why Your Optimizer Is Secretly a Regularizer
Train an overparameterized model with no explicit regularization — yet it generalizes. Build implicit bias theory from scratch: prove GD finds minimum-norm solutions in linear regression, derive max-margin convergence for logistic loss, show depth induces low-rank bias via matrix factorization, demonstrate the edge of stability where sharpness self-stabilizes at 2/η, and explain why small-batch SGD finds flatter minima — with interactive demos for a minimum-norm explorer and a flat-vs-sharp minima visualizer.
Read MoreNeural Tangent Kernels from Scratch: Why Infinitely Wide Networks Are Just Kernel Machines
A network with 10 million parameters fits 1,000 points perfectly — yet generalizes. Build Neural Tangent Kernel theory from scratch: derive the NTK as a Jacobian Gram matrix, prove the infinite-width convergence to a deterministic kernel, analyze training dynamics with exponential loss decay and spectral bias, measure the lazy-vs-rich regime transition across widths, and compute the analytic arccosine NTK recursion — with interactive demos for an empirical NTK explorer with eigenvalue spectrum and a lazy-vs-rich training visualizer.
Read MoreSecond-Order Optimization from Scratch: Beyond Gradient Descent with Curvature Information
Gradient descent treats all directions equally — but loss landscapes have curvature. Build second-order optimization from scratch: derive Newton's method and its quadratic convergence, implement L-BFGS two-loop recursion, compute natural gradients via the Fisher information matrix, approximate curvature with K-FAC for deep networks, and get Hessian-vector products for free — with interactive demos for an optimizer trajectory arena and a curvature visualizer.
Read MoreOnline Learning from Scratch: Making Decisions One at a Time with Regret Guarantees
Most ML assumes your data is i.i.d. What if it's adversarial? Build online learning from first principles — derive multiplicative weights with optimal regret bounds, implement online gradient descent with projection, unify everything under the Follow-the-Regularized-Leader framework, prove the online-to-batch conversion, and add AdaGrad's per-coordinate adaptivity — with interactive demos for an expert advice arena with adversarial modes and an online vs batch decision boundary visualizer.
Read MoreKernel Methods from Scratch: The Trick That Lets Linear Models Learn Nonlinear Patterns
Your data lives in 2D but the decision boundary is a circle. The fix: map to a higher-dimensional space where a hyperplane works. But what if that space is infinite-dimensional? Build the kernel trick from first principles — prove polynomial kernels compute exact feature-space dot products, verify Mercer's theorem via Gram matrix eigenvalues, compare six kernels side by side, compose custom kernels with algebraic closure properties, and kernelize ridge regression and PCA — with interactive demos for a kernel PCA feature-space visualizer and a Gram matrix explorer with eigenvalue spectrum.
Read MoreSpectral Clustering from Scratch: Using Eigenvalues to Find Hidden Structure
K-means slices a straight line through your concentric rings and calls it a day. Build spectral clustering from scratch: construct RBF similarity graphs with three sparsification strategies, compute the graph Laplacian and its normalized variant, extract eigenvectors that unfold non-convex shapes into linearly separable embeddings, and run k-means in eigenspace with NJW row-normalization — with interactive demos for a 5-stage spectral clustering pipeline with σ tuning and a k-means vs spectral side-by-side accuracy arena.
Read MoreConditional Random Fields from Scratch: Structured Prediction Beyond Independent Labels
Your classifier tags each word independently and gets "mat" wrong because it never looks at neighboring predictions. Build a linear-chain CRF from scratch with emission and transition potentials, implement the forward algorithm for partition function computation, Viterbi decoding for optimal sequences, and forward-backward for gradient-based training — with interactive demos for a CRF sequence tagger with animated DP table and a transition matrix explorer showing how pairwise weights reshape predictions.
Read MoreSemi-Supervised Learning from Scratch: Extracting Supervision from Unlabeled Data
You have 50 labeled images and 50,000 unlabeled ones. Build five semi-supervised methods from scratch: self-training with pseudo-labels, label propagation through RBF similarity graphs, Π-Model consistency regularization, entropy minimization, and MixMatch combining augmentation averaging, temperature sharpening, and MixUp — with interactive demos for a label propagation explorer and a supervised vs semi-supervised decision boundary comparison.
Read MoreFeature Selection from Scratch: Finding the Signal in a Sea of Variables
You add 50 new features expecting better predictions — instead, accuracy drops. Build six feature selection methods from scratch: filter methods with mutual information, mRMR for redundancy-aware selection, forward selection and RFE wrappers, Lasso coordinate descent with regularization paths, permutation importance, and stability selection via bootstrapping — with interactive demos for a three-method importance arena and a curse of dimensionality visualizer.
Read MoreOptimal Transport from Scratch: Moving Probability Mass at Minimum Cost
You have two piles of sand and want to reshape one into the other at minimum cost. This 200-year-old problem — from Monge's military logistics to Kantorovich's Nobel Prize — turned out to be exactly what modern ML needed. Build the Monge assignment problem, Kantorovich LP relaxation, Sinkhorn's algorithm, and Wasserstein barycenters from scratch — with interactive demos for transport plan visualization and a Wasserstein vs KL distance explorer.
Read MoreHierarchical Clustering from Scratch: Building Dendrograms That Reveal Data's Hidden Tree Structure
K-Means makes you pick k upfront. DBSCAN can't tell you which clusters are more similar to each other. Hierarchical clustering builds a dendrogram — a binary tree encoding every possible grouping at once. Build agglomerative clustering with all four linkage criteria, explore the Lance-Williams recurrence, and discover why single linkage equals the minimum spanning tree — with interactive demos for step-by-step merging and a four-way linkage showdown.
Read MoreConformal Prediction from Scratch: Distribution-Free Uncertainty with Guaranteed Coverage
MC Dropout and Deep Ensembles give useful uncertainty estimates — but no formal guarantees. Conformal prediction flips the question: what prediction sets are mathematically guaranteed to contain the true answer? Build split conformal methods, adaptive prediction sets, and conformalized quantile regression from first principles — with interactive demos for prediction set exploration and live coverage guarantee verification.
Read MoreUncertainty Quantification from Scratch: Teaching Neural Networks to Say "I Don't Know"
Your classifier reports 99% confidence — and is completely wrong. Softmax outputs aren't probabilities; they're normalized logits that grow unboundedly away from the decision boundary. Build reliability diagrams, MC Dropout, Deep Ensembles, and temperature scaling from first principles, and learn when to trust your model — with interactive demos for uncertainty heatmaps and live calibration tuning.
Read MoreMeta-Learning from Scratch: Teaching Neural Networks to Learn New Tasks from Just a Few Examples
Show a child five characters from an alien alphabet and they'll start recognizing new ones within minutes. Standard neural networks need thousands of examples. Build prototypical networks and MAML from first principles, discover how episode-based training teaches models to learn from few examples, and connect it all to in-context learning in modern LLMs — with interactive demos for few-shot classification and real-time sinusoid adaptation.
Read MoreResidual Networks from Scratch: Why Deeper Networks Need Shortcuts
Before ResNets, training networks deeper than 20 layers consistently failed — not from overfitting, but from a mysterious degradation problem. Build residual blocks, projection shortcuts, and bottleneck architectures from scratch, and discover why the skip connection is the single most important idea enabling modern deep learning — with interactive demos showing degradation in action and gradient flow through skip paths.
Read MoreAdversarial Examples from Scratch: How Invisible Perturbations Fool Neural Networks
Add noise invisible to the human eye and a neural network classifies a panda as a gibbon with 99% confidence. Build FGSM and PGD attacks from first principles, discover why high-dimensional linearity makes every model vulnerable, and train adversarially robust networks — with interactive demos that let you craft attacks and watch decision boundaries shift.
Read MoreDouble Descent from Scratch: Why Bigger Models Generalize Better (And Classical Statistics Got It Wrong)
Every textbook teaches the bias-variance tradeoff: bigger models overfit. But GPT-4 has trillions of parameters and generalizes beautifully. Build the double descent curve from polynomial regression through neural networks, see how regularization masks the interpolation peak, and understand why scaling works — with interactive demos that let you explore all three regimes.
Read MoreNormalizing Flows from Scratch: Invertible Neural Networks That Generate by Transforming
VAEs approximate likelihood, GANs abandon it entirely — but normalizing flows compute the exact probability of every data point. Build RealNVP coupling layers, Glow's 1×1 convolutions, and autoregressive flows from first principles, with interactive demos that let you scrub through flow transformations and explore learned densities.
Read MoreKalman Filter from Scratch: Predicting the Future by Trusting (But Verifying) Noisy Sensors
Every sensor lies — but the Kalman filter reconstructs truth from noise using elegant matrix algebra. Build the KF, EKF, and sensor fusion from scratch, with interactive demos that let you track objects and visualize how Gaussian fusion always reduces uncertainty.
Read MoreDBSCAN from Scratch: When K-Means Fails and Density Saves the Day
K-Means bulldozes through non-convex shapes. Build DBSCAN from scratch with BFS cluster expansion, k-distance elbow method, KD-tree acceleration, and HDBSCAN's parameter-free hierarchical clustering — with interactive demos that let you compare all three algorithms on the same data.
Read MoreBandit Algorithms from Scratch: The Explore-Exploit Dilemma That Powers Modern AI
Should you exploit what works or explore something new? Build multi-armed bandit algorithms from scratch — greedy, epsilon-greedy, UCB1, and Thompson Sampling — with interactive demos that let you watch algorithms learn in real time.
Read MoreK-Nearest Neighbors from Scratch: The Algorithm That Lets the Data Speak for Itself
No parameters, no training loop, no assumptions — just find the closest examples and let them vote. Build KNN from first principles with distance metrics, the curse of dimensionality, KD-trees, and interactive demos that let you paint decision boundaries in real time.
Read MoreHidden Markov Models from Scratch: Teaching Machines to Read Between the Lines
When the real signal is hidden, you need algorithms that reason under uncertainty. Build HMMs from absolute first principles — Markov chains, the Forward algorithm, Viterbi decoding, Backward posteriors, and Baum-Welch learning — with interactive demos that animate state transitions and trellis decoding step by step.
Read MoreLinear Regression from Scratch: The Algorithm That Launched a Thousand Models
Every ML journey starts here. Build linear regression from absolute first principles — closed-form solutions, gradient descent, polynomial features, Ridge and Lasso regularization — with interactive demos that let you fit lines and watch coefficients shrink in real time.
Read MoreExpectation-Maximization from Scratch
K-means assumes every cluster is a sphere — real data is messier. Build the EM algorithm from scratch: soft assignments via Bayes' rule, weighted parameter updates, monotonic convergence via the ELBO, and Gaussian Mixture Models that capture elliptical clusters k-means can't.
Read Moret-SNE from Scratch: Visualizing High-Dimensional Data
PCA finds the best linear projection — but most interesting structure isn't linear. Build t-SNE from scratch: convert distances to probabilities, solve the crowding problem with Student-t distributions, and minimize KL divergence to produce those beautiful 2D cluster plots.
Read MoreMonte Carlo Methods from Scratch
In 1946, Ulam couldn't solve solitaire analytically — so he played 100 games and counted. Build Monte Carlo methods from scratch: π estimation, importance sampling, rejection sampling, and MCMC — the random sampling toolkit that powers all of modern probabilistic ML.
Read MoreBayesian Optimization from Scratch
Grid search takes 7 years. Bayesian optimization finds near-optimal hyperparameters in 20 evaluations. Build BO from scratch — GP surrogates, acquisition functions (EI, UCB, PI), and the sequential optimization loop that powers modern hyperparameter tuning.
Read MoreGaussian Processes from Scratch
Most ML models give you a prediction and shrug. Gaussian processes give you a prediction and a confidence interval — they know what they don't know. Build GPs from scratch with kernels, Cholesky decomposition, and Bayesian inference.
Read MoreRecommender Systems from Scratch
Netflix's recommendation engine is worth $1 billion per year. Build the algorithms behind it from scratch — collaborative filtering, matrix factorization, content-based filtering, and neural methods — to understand how machines predict what you'll love.
Read MoreAnomaly Detection from Scratch
Knight Capital lost $440 million in 45 minutes from a single bug. Build four anomaly detectors from scratch — z-scores, k-NN, Local Outlier Factor, and Isolation Forest — and learn when each one wins.
Read MoreCausal Inference from Scratch
Ice cream sales correlate with drowning deaths — should we ban ice cream? Build the math of cause-and-effect from Simpson's paradox through potential outcomes, causal graphs, propensity scores, difference-in-differences, and instrumental variables.
Read MoreSelf-Supervised Learning from Scratch
ImageNet took 25,000 workers two years to label. GPT-4 trained on trillions of unlabeled tokens. Build masked language modeling, masked autoencoders, BYOL, and DINO from scratch — and discover why creating your own labels beats human annotation.
Read MoreGenetic Algorithms from Scratch
Gradient descent needs gradients. Evolution doesn't. Build genetic algorithms from scratch — selection, crossover, mutation — then solve the Traveling Salesman Problem, evolve neural network weights without backprop, and explore CMA-ES.
Read MoreBayesian Inference from Scratch
Your model gives you one answer. Bayesian inference gives you every plausible answer and how much to trust each one. Build from Bayes' theorem through conjugate priors, MAP estimation, and MCMC sampling to Bayesian deep learning.
Read MoreInformation Theory from Scratch
Every loss function speaks the same language — surprise. From Shannon's 1948 insight through entropy, cross-entropy, KL divergence, and perplexity, discover the mathematical thread connecting every algorithm in modern AI.
Read MoreLogistic Regression from Scratch
The most important algorithm never given its own post. A single neuron IS logistic regression. Build it from maximum likelihood, derive the elegant gradient, extend to multi-class softmax, and watch the exact moment it becomes a neural network.
Read MoreNaive Bayes from Scratch
A theorem from 1763 still powers every spam filter you've ever used. Build three Naive Bayes variants from scratch — Gaussian, Multinomial, and Bernoulli — and discover why an algorithm built on a provably wrong assumption consistently embarrasses models a hundred times more complex.
Read MoreDecision Trees & Random Forests from Scratch
Decision trees are the only ML model you can literally read like a flowchart. Build CART from scratch, learn why unpruned trees memorize noise, then grow a Random Forest and discover how feature subsampling turns weak learners into one of ML's most reliable algorithms.
Read MorePCA from Scratch
Karl Pearson published the most important technique in multivariate statistics in a philosophy magazine in 1901. Build PCA from scratch via eigendecomposition and SVD, visualize principal components collapsing a point cloud, and discover when linear projections fail with kernel PCA, t-SNE, and UMAP side-by-side.
Read MoreK-Means Clustering from Scratch
What if your data has no labels at all? Build K-Means clustering from scratch — Lloyd's algorithm, K-Means++ smart initialization, silhouette scores, and DBSCAN for when K-Means fails. Watch centroids converge step-by-step in interactive demos.
Read MoreSupport Vector Machines from Scratch
Forget gradient descent on a loss — SVMs find the decision boundary with the widest possible margin. Build hard-margin, soft-margin, and kernelized SVMs from scratch, and discover the kernel trick: computing in infinite-dimensional spaces without ever going there.
Read MoreML Evaluation from Scratch
Your model got 94% accuracy — but is that actually good? Build proper evaluation from scratch: stratified splits, k-fold cross-validation, metrics beyond accuracy, and statistical significance tests that reveal whether your "improvement" is real or just noise.
Read MoreGradient Boosting from Scratch
The algorithm that still dominates Kaggle and production tabular ML isn't a neural network — it's gradient boosting. Build decision trees, random forests, and XGBoost from scratch, and discover why fitting trees to other trees' mistakes is gradient descent in function space.
Read MoreTime Series Forecasting from Scratch
Unlike NLP where transformers dominate, time series has a richer landscape where simple methods regularly beat deep learning. Build five forecasting methods from moving averages to temporal transformers, and discover why the M-competitions keep proving that less is often more.
Read MoreInstruction Tuning from Scratch
Your pre-trained model knows everything but can't follow a single instruction. Instruction tuning is the step that transforms a raw text predictor into a helpful assistant — and it works with shockingly little data. Build SFT from scratch and discover why 1,000 perfect examples beat 50,000 mediocre ones.
Read MoreMechanistic Interpretability from Scratch
We've built 48 posts teaching how to build neural networks — now let's open the hood and see what they actually learn inside. Build the core interpretability toolkit from scratch: superposition models, probing classifiers, activation patching, the logit lens, attention head taxonomy, and sparse autoencoders for feature extraction.
Read MoreLearning Rate Schedules from Scratch
The same model trains perfectly or fails completely based on a single curve. Build every major LR schedule from scratch — constant, step decay, cosine annealing with warm restarts, warmup + cosine (the GPT recipe), and cyclical rates. Interactive demos let you explore schedules on a 2D loss landscape and run the LR range test to find the sweet spot automatically.
Read MoreNeural Network Pruning from Scratch
Your neural network is 90% dead weight — literally. Prune 70% of weights and lose barely 1% accuracy. Build magnitude pruning, structured vs unstructured approaches, the Lottery Ticket Hypothesis (sparse subnetworks that match dense accuracy), gradual cubic-schedule pruning, and modern one-shot methods like Wanda. Complete the compression trinity: quantization reduces bits, pruning removes weights, distillation shrinks architecture.
Read MoreModel Merging from Scratch: Combining Neural Networks Without Retraining
Two models trained on different tasks — average their weights and get one model that does both? It sounds like it shouldn't work, but it does. Build every major merging technique from scratch: LERP, SLERP, task arithmetic, TIES-Merging, and DARE. Explore why the loss landscape makes it all possible, with interactive demos to drag merge points and compare methods.
Read MoreDPO from Scratch: Training LLMs with Human Preferences Without RL
RLHF works but it's a nightmare — four models, PPO instability, reward hacking. DPO eliminates all of it with a single mathematical identity: the reward model is redundant. Derive the DPO loss step by step, implement it from scratch in NumPy, compare it head-to-head with RLHF, and explore the preference optimization zoo: IPO, KTO, and ORPO.
Read MoreTest-Time Compute from Scratch: How Models Think Longer to Think Better
We spent 42 posts on training-time scaling — but in 2024, a second axis emerged: give models more time to think. A smaller model reasoning for 60 seconds outperforms one 14× larger answering instantly. Build the full stack from scratch: chain-of-thought as compute, Best-of-N verification, MCTS tree search over reasoning steps, and DeepSeek-R1's GRPO that teaches models to think via pure reinforcement learning.
Read MoreAttention Variants from Scratch: GQA, MQA, and Why Modern LLMs Share Heads
Your attention implementation works perfectly — and it's unusable in production. Every deployed LLM uses variants that trade KV cache memory for quality: MQA shares one K,V across all heads, GQA finds the sweet spot with groups, and sliding window attention makes 128K contexts possible. Build each from scratch with NumPy and see the Pareto frontier of quality vs efficiency.
Read MoreIn-Context Learning from Scratch: How LLMs Learn Without Updating a Single Weight
When you give an LLM a few examples in a prompt and it learns the pattern, no weights change — the model implements a learning algorithm inside its forward pass. Discover how attention literally performs gradient descent, how Anthropic's induction heads form through a sharp phase transition, and how task vectors compress demonstrations into a single direction in activation space.
Read MoreBackpropagation from Scratch: How Neural Networks Learn by Going Backwards
Every neural network ever trained learned through the same algorithm — not gradient descent, not the loss function, but backpropagation. Derive it from the chain rule, implement manual forward and backward passes through a 3-layer MLP, verify against numerical gradients to 12 decimal places, and watch interactive demos reveal why vanishing gradients kill deep sigmoid networks while ReLU and residual connections save them.
Read MoreNeural Scaling Laws from Scratch: Why Bigger Models Predictably Win
Before GPT-4 trained a single token, its creators knew roughly how well it would perform — from an equation. Derive the power laws governing model performance, reproduce Kaplan's three scaling axes, understand how Chinchilla corrected the compute-optimal allocation, and explore the LLaMA over-training revolution. Interactive demos let you explore loss surfaces and watch Kaplan vs Chinchilla optimal points diverge as compute grows.
Read MoreSparse Autoencoders from Scratch: The Microscope for Neural Networks
Neural networks pack more concepts than they have neurons — a trick called superposition that makes individual neurons uninterpretable. Sparse autoencoders reverse this by expanding into an overcomplete dictionary with sparsity constraints, forcing each feature to represent one clean concept. Build SAEs from scratch, watch polysemantic neurons become monosemantic features, and understand how Anthropic discovered interpretable concepts like "Golden Gate Bridge" inside Claude.
Read MoreFlow Matching from Scratch: The Simpler Path from Noise to Data
Diffusion models need a noise schedule, a Markov chain, and hundreds of steps. Flow matching draws a straight line from noise to data — learn a velocity field, solve an ODE, done. Build conditional flow matching and rectified flow from scratch, compare side-by-side with diffusion, and watch interactive demos show why straighter paths need fewer steps.
Read MoreDiffusion Models from Scratch: How AI Learns to Draw by Undoing Noise
The best way to create an image is to learn how to destroy one. Build the complete diffusion pipeline from scratch — the forward process that turns images into static, the denoising U-Net that reverses it, and both DDPM and DDIM sampling algorithms. Then see how Stable Diffusion scales it all up with latent space compression and CLIP-guided text conditioning. Interactive demos let you watch noise destroy and recreate a Swiss roll, and explore why cosine schedules beat linear ones.
Read MoreFlash Attention from Scratch: Why the Fastest Attention Algorithm Never Materializes the Attention Matrix
Standard attention's dirty secret isn't math — it's memory. For a 32K-token sequence, it writes a billion-element matrix per head per layer. Flash Attention computes exact attention without ever storing it, using IO-aware tiling and the online softmax trick. Build it from scratch in NumPy, prove it's bit-identical to naive attention, and watch interactive demos visualize the 10x memory traffic reduction that made 100K+ token contexts possible.
Read MoreRotary Position Embeddings from Scratch: How Modern LLMs Know Where Words Are
Every modern LLM uses RoPE — but how does rotating vectors encode position? Build RoPE from scratch in NumPy, prove why relative positions emerge from dot products, explore the complex number interpretation, and implement the scaling tricks (PI, NTK, YaRN) that push context windows to 128K+ tokens. Interactive demos let you rotate query and key vectors and visualize frequency bands across scaling strategies.
Read MoreActivation Functions from Scratch: Why Every Neuron Needs a Plot Twist
Prove that neural networks without activations collapse to a single linear layer, then build every major activation function from scratch in NumPy — from sigmoid's vanishing gradients through the ReLU revolution to GELU's smooth probabilistic gating. A controlled training experiment races 8 activations head-to-head, and interactive demos let you overlay any functions and watch gradient flow vanish (or survive) through a 6-layer network.
Read MoreCLIP from Scratch: Teaching Machines to See and Read at the Same Time
Build CLIP from scratch — the model that powers Stable Diffusion, zero-shot classification, and multimodal search. Construct dual encoders, derive the symmetric contrastive loss over an NxN similarity matrix, and implement zero-shot image classification in 10 lines of code. Interactive embedding explorer shows how images and text cluster in the shared space.
Read MoreGraph Neural Networks from Scratch: How AI Learns to Reason About Relationships
The series built models for grids (CNNs), sequences (transformers, RNNs, SSMs), and unstructured data — but missed the most general data structure: graphs. Build GCN, GraphSAGE, and GAT from pure NumPy using the universal message passing framework. Three interactive demos let you watch messages flow through a graph, explore learned attention weights on edges, and see over-smoothing destroy diversity as you stack layers. The architecture that makes transformers and CNNs special cases.
Read MoreReinforcement Learning from Scratch: Teaching Machines to Learn from Consequences
The series covered supervised learning (labels) and unsupervised learning (patterns) — but missed the third paradigm: learning from consequences. Build MDPs, Q-learning, policy gradients, and actor-critic methods from pure NumPy. Three interactive demos let you watch a gridworld agent's Q-values converge in real time, race exploration strategies on a multi-armed bandit, and train a CartPole balancer with REINFORCE vs Actor-Critic. The foundation that makes RLHF possible.
Read MoreState Space Models from Scratch: How Mamba Learned to Rival Transformers
The elementary series built RNNs (sequential, O(n)) and transformers (parallel, O(n²)) — but missed the architecture that delivers both: linear-time processing that parallelizes during training. Build the complete SSM pipeline from control theory ODEs through discretization, the convolution trick, HiPPO's optimal memory, and Mamba's selective scan — all from pure NumPy. Three interactive demos let you toggle between continuous, discrete, and convolution views of the same model, explore Mamba's content-aware selection mechanism, and watch attention's O(n²) cost explode against SSM's O(n) line.
Read MoreContrastive Learning from Scratch: How AI Learns to See Without Labels
Before CLIP, before DINO, before foundation models — there was a simple idea: teach a network that two views of the same image should be similar, and everything else should be different. Build SimCLR, derive InfoNCE loss, implement CLIP's cross-modal training, and explore DINO's self-distillation — all from pure NumPy. Three interactive demos let you visualize augmentation pairs, drag points on the InfoNCE loss landscape, and watch a mini-CLIP learn zero-shot classification.
Read MoreGenerative Adversarial Networks from Scratch: How Neural Networks Learned to Create by Competing
Pit two neural networks against each other — a Generator that forges data and a Discriminator that spots fakes — and watch the forger become a master through competition alone. Build the complete GAN framework from pure NumPy, derive the minimax objective, confront mode collapse, then fix it with Wasserstein distance. Three interactive demos let you watch 1D training converge in real time, toggle between GAN and WGAN on a mode collapse task, and walk through a generator's learned latent space.
Read MoreAutoencoders & VAEs from Scratch: How Neural Networks Learn to Compress and Imagine
Build autoencoders and Variational Autoencoders from pure NumPy. Compress images through a bottleneck, explore why vanilla autoencoders can't generate new data, then add the reparameterization trick and KL divergence to create smooth, sampleable latent spaces. Interactive demos let you click anywhere in the latent space to decode images, tune the β regularization knob, and generate novel digits from random noise. The missing link to latent diffusion.
Read MoreRecurrent Neural Networks from Scratch: How Machines Learned to Remember
Build RNNs, LSTMs, and GRUs from pure NumPy — hidden states, gating mechanisms, and backpropagation through time. Watch the vanishing gradient problem destroy vanilla RNN memory in real time, then see how LSTM's cell state highway fixes it. Interactive demos visualize gradient flow across 50 timesteps, trace hidden state evolution character by character, and race RNN sequential processing against transformer parallelism.
Read MoreConvolutional Neural Networks from Scratch: How Machines Learned to See Before Transformers
Build a CNN from pure NumPy — convolutions, pooling, feature hierarchies, and a full LeNet architecture. Understand the four inductive biases that made CNNs dominate vision for a decade, and see exactly what Vision Transformers had to beat. Interactive demos animate convolution kernels sliding across images, visualize feature maps at each layer, and race CNN receptive fields against ViT's instant global attention.
Read MoreVision Transformers from Scratch: Your Transformer Already Understands Images
The same transformer you built for language works for images — no convolutions needed. Split images into patches, treat them as tokens, and reuse every component from the series. Build a complete ViT with patch embedding, [CLS] token, bidirectional attention, and a classification head. Interactive demos let you patchify images, explore attention heatmaps, and watch position embeddings discover 2D structure.
Read MoreDiffusion Models from Scratch: How AI Learns to Denoise the Universe
The transformer series built language — now enter the visual frontier. Add Gaussian noise until data is pure static, then train a network to reverse each step. Build DDPM, DDIM fast sampling, and classifier-free guidance on 2D toy data where you can see everything happen.
Read MoreKnowledge Distillation from Scratch: Teaching Small Models Everything a Big Model Knows
A 70B teacher knows "73% cat, 22% lynx, 4% tiger" — but fine-tuning on its outputs throws away everything except "cat." Build Hinton's distillation loss from scratch, derive the T² gradient scaling, implement three student training regimes, and discover why dark knowledge makes small models punch above their weight.
Read MoreRLHF from Scratch: How Language Models Learn What Humans Want
A perfect next-token predictor is not a useful assistant. Build the complete RLHF pipeline from scratch — SFT to teach format, reward models with Bradley-Terry preferences, PPO's clipped surrogate objective, and DPO's elegant partition-function cancellation that collapses it all into one loss. Interactive preference arena where you train a reward model with your clicks.
Read MoreSpeculative Decoding from Scratch: How LLMs Generate Text 2-3x Faster
Your GPU sits at 0.6% utilization during text generation. Speculative decoding fixes this — use a small draft model to propose tokens, verify them all in one pass with the big model, and get 2-3x speedup with mathematically identical output. Build the full algorithm including rejection sampling proof.
Read MoreMixture of Experts from Scratch: How One Transformer Becomes Eight
Every token through every parameter? Not anymore. Build a complete MoE layer in NumPy — the router, top-k selection, load balancing loss, and expert dispatch — then explore DeepSeek-V3's shared experts, fine-grained routing, and auxiliary-loss-free balancing. Interactive routing visualizer shows collapse vs. balanced assignment.
Read MoreThe Complete Transformer from Scratch: Assembling Every Piece We've Built
The capstone: assemble all 14 components — tokenization, embeddings, positions, attention, normalization, FFN, softmax, and more — into a complete 222K-parameter GPT-style transformer that trains on a CPU and generates text. Interactive Transformer X-Ray lets you watch data flow through every stage.
Read MoreFeed-Forward Networks from Scratch: The Other Half of Every Transformer Block
Two-thirds of a transformer's parameters live in the FFN, not attention. Build the classic FFN, every major activation function (ReLU, GELU, SiLU), and SwiGLU — the exact architecture powering LLaMA and Mistral — then discover why FFN layers are really massive key-value memory banks.
Read MoreNormalization from Scratch: Why Every Transformer Layer Needs a Reset Button
Without normalization, activations explode through 50 layers. Build BatchNorm, LayerNorm, and RMSNorm from pure math, discover why BatchNorm fails for transformers, and learn why Pre-Norm RMSNorm conquered modern LLMs — with an interactive signal flow visualizer.
Read MoreQuantization from Scratch: How LLMs Shrink to Fit Your GPU
A 7B model needs 14 GB in float16 — but what if you could cut that to 4 GB? Build symmetric, asymmetric, and NormalFloat quantization from pure math. Implement GPTQ and QAT, discover why 4-bit is the sweet spot, and explore it all in an interactive quantization playground.
Read MoreLoRA from Scratch: Fine-Tuning Without Retraining Everything
Full fine-tuning a 7B model needs 60 GB of GPU memory. LoRA does it with 0.1% of the parameters. Build the low-rank decomposition from pure math and NumPy, train a LoRA network, merge it at zero cost, and explore QLoRA — with an interactive rank slider that reveals why weight updates are low-rank.
Read MoreKV Cache from Scratch: Why LLMs Don't Recompute Everything
Every token generated means recomputing attention over the entire sequence — unless you cache the keys and values. Build the KV cache from scratch, see how it turns O(n²) into O(n), and learn why GQA shrinks the cache by 4-8× in modern LLMs.
Read MorePositional Encoding from Scratch: How Transformers Know Word Order
Attention is permutation-invariant — "dog bites man" and "man bites dog" look identical. Build sinusoidal encoding, learned embeddings, and Rotary Position Embeddings (RoPE) from pure NumPy, with the mathematical proof of why RoPE conquered modern LLMs.
Read MoreDecoding Strategies from Scratch: How LLMs Choose Their Next Word
Build greedy, random, temperature, top-k, nucleus sampling, and beam search from pure Python. Watch each strategy fail, then discover why nucleus sampling conquered text generation — with an interactive playground where you control every parameter.
Read MoreOptimizers from Scratch: How Neural Networks Actually Learn
Build SGD, Momentum, RMSProp, and Adam from pure math and Python. Watch vanilla gradient descent fail on ravines, then fix it two different ways — and discover why Adam conquered deep learning. Interactive demo lets you race all four optimizers on 2D landscapes.
Read MoreLoss Functions from Scratch: Why Cross-Entropy Rules Deep Learning
Build MSE, binary cross-entropy, and categorical cross-entropy from pure math. Discover why MSE's gradient vanishes when the model is confidently wrong, and how cross-entropy's logarithmic fix gives gradient descent exactly the push it needs.
Read MoreSoftmax & Temperature from Scratch: How LLMs Make Choices
Build softmax from pure math, break it with overflow, fix it with the subtract-max trick, then explore how temperature reshapes probability distributions — from Boltzmann's 1868 physics to ChatGPT's creativity slider.
Read MoreTokenization from Scratch: How LLMs Read Text
Build Byte Pair Encoding from scratch in Python. Start with character-level and word-level tokenizers, watch them fail, then build BPE step by step — with an interactive demo where you watch merges compress text in real time.
Read MoreMicrograd from Scratch
Build a tiny autograd engine in ~100 lines of Python. Teach numbers to remember their history and compute their own derivatives — then train a neural network with it.
Read MoreAttention Is All You Need (To Implement)
Implement the Transformer's core attention mechanism from scratch in NumPy. Build scaled dot-product attention, multi-head attention, and positional encoding — with an interactive heatmap to visualize what attention actually sees.
Read MoreEmbeddings from Scratch: How Words Become Vectors
Build Word2Vec's skip-gram from scratch in NumPy. Start with one-hot encoding, discover why it fails, then teach random numbers to understand meaning — with a live interactive demo where you watch words cluster in real time.
Read MoreWeight Initialization from Scratch: The First Decision That Determines If Your Network Learns
Before your network sees a single training example, you've already made the decision that determines whether it learns or dies. Derive Xavier and Kaiming initialization from first principles, race 12 activation-init combinations head-to-head, and explore modern strategies from orthogonal init to GPT-2's residual scaling. Interactive demos let you watch variance explode or collapse through 30 layers and race four initialization strategies in real time.
Read MoreRegularization from Scratch: Every Trick That Stops Your Network from Memorizing the Training Set
Your network hit 99.8% training accuracy and 54% test accuracy — it memorized the answers instead of learning the subject. Build every major regularization technique from scratch: L1/L2 weight penalties, dropout, early stopping, data augmentation, and label smoothing. Interactive demos let you watch overfitting happen in real time and race four regularization strategies head-to-head with weight magnitude heatmaps.
Read MoreEncoder-Decoder from Scratch: The Architecture That Launched Modern NLP
The transformer was born as an encoder-decoder — we just forgot. Build the complete encoder-decoder architecture from LSTM seq2seq through Bahdanau attention to the full transformer, compare it head-to-head against decoder-only, and explore why models like T5, BART, and Whisper still use the original design. Interactive demos let you visualize token flow with cross-attention arrows and race the two architectures side-by-side.
Read MoreApplied AI
Practical AI for real-world problems
Model Explainability with SHAP and LIME: Opening the Black Box
Your model says "reject the loan" — but why? Dive into the game theory behind Shapley values, build LIME from scratch with perturbation sampling and weighted regression, and compare both frameworks head-to-head for stability, speed, and production readiness.
Read MoreA/B Testing ML Models in Production
Your new model beats the baseline offline, but will it survive production traffic? Build a complete online experimentation pipeline — z-tests, power analysis, O'Brien-Fleming sequential testing, Thompson Sampling bandits, segment-level analysis with Bonferroni correction, and guardrail metrics. Includes interactive demos for sample size exploration and bandit vs. A/B racing.
Read MoreBuilding an AI Search Engine from Scratch
Build a complete search engine from raw text to ranked results — a document ingestion pipeline with SQLite FTS5 and embeddings, hybrid BM25 + vector retrieval with reciprocal rank fusion, cross-encoder reranking, a FastAPI search API, and search quality metrics with P@K, R@K, and NDCG.
Read MoreML Experiment Tracking
It's 11 PM, you've run 47 experiments, and your best model is model_final_v2_FINAL_actually_final.pkl. Build a complete experiment tracking system from scratch — a minimal tracker with human-readable JSON/CSV files, a comparison dashboard, a reproducibility framework, MLflow-style experiments with model registry, and a hyperparameter sweep manager with grid, random, and successive halving search strategies.
Read MoreBuilding AI Code Review Tools
Your senior engineers spend 20-30% of their time reviewing code. Build four AI review tools from scratch — a single-file analyzer, diff-aware PR reviewer, multi-agent pipeline with specialized security/performance/style agents, and a GitHub bot that posts inline comments. Includes counterintuitive findings: personas hurt accuracy, and multi-pass aggregation boosts F1 by 43%.
Read MoreBuilding Multimodal AI Apps: Vision, Documents, and Beyond with Modern VLMs
Two years ago, extracting data from a receipt required OCR, regex, and a prayer. Now it's one API call. Build four multimodal application patterns — document intelligence, chart analysis, batch image processing, and production orchestration — with working Python code, real pricing, and interactive demos.
Read MoreLLM Memory Systems: Building AI Applications That Remember
Every LLM starts as an amnesiac — each conversation begins from zero. Build five progressively sophisticated memory systems from scratch (conversation buffer, sliding window, summary memory, entity memory, and semantic long-term memory) that make your AI genuinely smarter the more you interact with it.
Read MoreContext Window Strategies: Fitting the World Into Your LLM's Memory
Context windows keep growing, but bigger isn't always better. Build five progressively sophisticated strategies -- smart truncation, chunk-and-summarize, map-reduce, hierarchical summarization, and agentic context management -- with a decision framework for choosing the right one. Includes interactive demos showing how strategies process documents and why naive context stuffing fails.
Read MoreSynthetic Data Generation: Using LLMs to Build Your Own Training Datasets
Fine-tuning works — but where do you get 1,000 labeled examples? Build four progressively sophisticated synthetic data pipelines (self-instruct, few-shot amplification, evol-instruct, and quality filtering) that turn a task description into a production-ready training dataset for $10-$50 in API calls.
Read MoreBuild an LLM Router: Automatically Sending Each Query to the Right Model
Sending every query to GPT-4 is burning money. Build four progressively smarter routing strategies — heuristic, embedding-based, LLM-as-judge, and cascade — that cut API costs 60-80% while preserving quality. Includes production patterns with fallbacks, circuit breakers, and an interactive cost savings calculator.
Read MoreRunning LLMs on Your Own Machine
Go from zero to a locally-served LLM in three commands. Compare Ollama, llama.cpp, and vLLM head-to-head on throughput and latency, master the memory math that determines what fits on your GPU, and use the interactive "Will It Fit?" calculator to find your setup's sweet spot.
Read MoreRetrieval Reranking: Making RAG Actually Good
Your RAG pipeline retrieves 20 candidates but shows the user the wrong 3. The fix is reranking. Build three rerankers from scratch — cross-encoder, LLM-as-judge, and feature-based — benchmark them head-to-head on NDCG@5 and MRR, and watch documents jump positions in the interactive Reranking Arena.
Read MoreSystematic Prompt Engineering: From Cargo-Culting to Measurable Results
Stop guessing and start measuring. Five prompt engineering techniques — role framing, few-shot selection, chain-of-thought, output structuring, and negative constraints — each quantified against a 20-case eval set. Build a customer support classifier from 62% to 96% accuracy, one technique at a time, with an interactive Prompt Lab to experiment yourself.
Read MoreMulti-Agent Orchestration: Building LLM Systems That Delegate, Verify, and Self-Correct
One agent hits walls — context limits, lost focus, inconsistent results. Build three multi-agent orchestration patterns from scratch: Sequential Pipeline, Parallel Fan-Out, and Debate & Consensus, with a budget tracker, real benchmarks, and an interactive Multi-Agent Playground that lets you watch agents coordinate in real time.
Read MoreLLM Function Calling Done Right: From Raw Prompts to Production Tool Use
Function calling is the mechanism that turns a chatbot into an agent. Build a personal finance assistant with OpenAI and Anthropic side by side, master the three API differences that cause cross-provider bugs, and step through parallel calls, sequential chains, and error handling in the interactive Function Calling Playground.
Read MoreGuardrails for LLM Applications: Input Validation, Output Filtering, and Prompt Injection Defense
Every tutorial shows how to call an LLM API — almost none show how to stop it from going off the rails. Build a complete three-layer guardrails system: prompt injection detection, PII scanning, output safety filtering, and a production middleware pipeline. Interactive Guardrail Playground lets you try real attacks.
Read MoreStreaming LLM Responses: Server-Sent Events, Chunked Transfer, and the UX of Waiting
Every chat interface streams tokens one by one — but how does it actually work? Build a complete streaming pipeline from LLM API to browser, with SSE wire protocol deep dive, FastAPI relay, browser rendering patterns, and an interactive Streaming Playground with real-time latency metrics.
Read MoreFine-Tuning Language Models: A Practical Guide from Dataset to Deployment
You've hit the prompt engineering ceiling — inconsistent JSON, ignored formatting rules, $100/month for a $3 task. Fine-tuning is the escape hatch. A complete walkthrough from JSONL dataset preparation to OpenAI API fine-tuning and open-source LoRA, with a decision framework, synthetic data generation, evaluation suite, and an interactive ROI calculator that shows exactly when fine-tuning pays for itself.
Read MoreEvaluating LLM Systems: How to Know If Your AI Actually Works
Most teams ship LLM systems on vibes. Build a proper eval framework from scratch — deterministic assertions, LLM-as-judge with calibrated rubrics, and adversarial tests that break things on purpose. Includes a reusable EvalHarness class and the eval-driven development workflow.
Read MoreBatch Processing with LLMs: 10,000 API Calls Without Going Broke
Go from one API call to ten thousand without melting your wallet. Build an async pipeline with rate limiting, retries, caching, cost tracking, and checkpointing — plus an interactive calculator to estimate your real costs.
Read MoreBuilding AI Agents with Tool Use: From Chat to Action
Build a working AI agent in ~80 lines of Python — no frameworks, just a while loop and tool calling. Covers OpenAI and Anthropic APIs, the ReAct pattern, reliability patterns, and an interactive trace visualizer that lets you step through an agent's reasoning.
Read MoreStructured Output from LLMs: Getting JSON Every Time
Four progressively more robust approaches to getting valid, schema-compliant JSON from LLMs — from prompt engineering alone to Pydantic + instructor. Working code for OpenAI and Anthropic, with an interactive schema validator demo.
Read MoreBuilding a RAG Pipeline from Scratch
Build a complete Retrieval-Augmented Generation system in ~60 lines of Python. Chunk documents, embed them, retrieve relevant context, and generate grounded answers — no LangChain, no LlamaIndex, just raw code.
Read MoreUsing LLMs to Parse Grocery Receipts
Snap a photo, get structured data. Build a pipeline that turns receipt images into a SQLite database — with validation, cost analysis, and useful queries.
Read MorePractical Claude Code Patterns for Real Projects
Seven battle-tested patterns from building 21 games and 5 blog posts with Claude Code — the headless loop, hot-swappable prompts, watchdog supervision, and more. Copy-paste recipes, not theory.
Read MoreBackend & Infra
Databases, performance, and tooling deep dives
Distributed Training Benchmarks: Data Parallel, Model Parallel, and Pipeline Parallel Compared
A single A100 has 80GB — GPT-3 needs 350GB. Benchmark DDP, tensor parallelism, and pipeline parallelism head-to-head across 4 model sizes, then use a decision framework to pick the right strategy.
Read MoreML Model Monitoring: Catching Silent Failures
Your model shipped at 95% accuracy — six months later it's wrong 30% of the time and your dashboard shows green. Build drift detectors (PSI, KS, JS divergence), calibration monitors, feature health checks, and CUSUM alerting from scratch.
Read MoreLoad Testing AI APIs: Finding the Breaking Point
Build async load testers from scratch, map throughput S-curves and latency hockey-sticks, and find the exact concurrency where your LLM endpoint breaks — before your users do.
Read MoreGPU Memory Benchmarks: Will This Model Fit?
Profile GPU memory across model sizes, precisions, and training stages. Concrete formulas and interactive calculators to predict whether your model fits before hitting CUDA OOM.
Read MoreServing LLMs at Scale: From Naive to vLLM
Your single-request LLM server handles 24 req/min — production needs 600+. Walk through static batching, continuous batching, PagedAttention, and speculative decoding with Python simulators, then explore interactive demos that visualize scheduling strategies and GPU memory budgets.
Read MoreLLM Cost Optimization: Cutting Your API Bill by 80% Without Sacrificing Quality
Your LLM bill is growing faster than your user base. Learn the five-layer optimization stack — prompt compression, model routing, tiered caching, batch processing, and token budgets — with real Feb 2026 pricing, Python implementations, and interactive demos that let you calculate savings in real time.
Read MoreLLM Observability in Production: Tracing, Logging, and Monitoring AI Systems
Your APM says everything's green while your LLM hallucinates and your API bill doubles. Build a complete observability stack — from structured logging and cost attribution to quality monitoring and adaptive alerts — with interactive demos that let you tune thresholds in real time.
Read MoreProfiling Python AI Code: Finding the Bottleneck Before You Optimize
Benchmark cProfile, py-spy, and Scalene head-to-head on realistic AI workloads, then use a complete profiling workflow to find a 2x speedup in a RAG pipeline. Includes interactive flame graph explorer and bottleneck detective game.
Read MorePython Concurrency for AI Workloads: asyncio vs Threading vs Multiprocessing Benchmarked
Benchmark asyncio, threading, and multiprocessing across realistic AI workloads — LLM API calls, parallel tokenization, and hybrid RAG pipelines. Includes interactive demos visualizing GIL contention and serialization crossover points.
Read MoreHybrid Search Benchmarks: BM25 + Vector Search vs Either Alone
Build BM25, vector, and hybrid search from scratch, then benchmark all three on 200 queries across 5 categories. Discover that hybrid wins by 14 NDCG points — but only because 30-40% of queries genuinely need both signals.
Read MoreVector Search at Small Scale: pgvector vs FAISS vs Brute Force NumPy
Benchmark three approaches to nearest-neighbor search on 10K–100K vectors. Measure indexing time, query latency, memory, and recall — then discover that brute-force NumPy is surprisingly competitive.
Read MoreSQLite FTS5 vs rapidfuzz: Fuzzy Search Showdown
Head-to-head benchmark on 500K product names. Compare query speed, result quality, batch throughput, and setup complexity — plus a hybrid approach that gets sub-2ms typo-tolerant search even at scale.
Read MoreCaching LLM Responses: Exact Match, Semantic Cache, and Prompt Hashing Benchmarked
Build and benchmark three caching strategies for LLM APIs — exact match, semantic similarity, and structural prompt hashing. Measure hit rates, latency, and cost savings on a 10K query dataset.
Read MoreLLM API Latency Benchmarks: OpenAI vs Anthropic vs Local Models Under Real Load
Rigorous benchmarks of 5 LLM APIs across 5 concurrency levels — measuring TTFT, inter-token latency, throughput, error rates, and true cost. The numbers nobody publishes honestly.
Read MoreDadOps Chronicles
Building with AI, raising kids, shipping code
How Ralph Loop Works
Deep dive into the autonomous coding system that built 21 games and 4 blog posts overnight. Three files, one bash loop, and lessons learned from letting an AI agent run unsupervised.
Read More