Blog

Elementary AI

Building understanding from the ground up

2026-02-28 Elementary AI

Imitation Learning from Scratch

Every RL algorithm needs a reward function — but what if you just have a human showing you what to do? Build the complete imitation learning pipeline from scratch: behavioral cloning as supervised learning on expert demonstrations, the devastating distributional shift problem where O(T²) errors compound quadratically, DAgger's elegant fix of querying the expert on the learner's own states, feature-matching inverse RL that recovers hidden reward functions from behavior, and Maximum Entropy IRL that resolves reward ambiguity via the Boltzmann distribution. Two interactive demos let you watch BC drift off-path while DAgger self-corrects, and place expert waypoints to recover reward heatmaps in real time.

2026-02-28 Elementary AI

Hopfield Networks from Scratch

Your brain doesn't search for memories — it completes them. Build the oldest and most physically-grounded neural network from scratch: binary Hopfield networks with Hebbian learning, energy landscapes with convergence proofs, the 0.138N capacity wall from statistical mechanics, continuous Hopfield networks with exponential storage, and the stunning result that softmax attention IS a Hopfield update — making every Transformer a deep associative memory. Two demos let you draw and recall patterns and watch Hopfield and attention produce identical outputs live.

2026-02-28 Elementary AI

Survival Analysis from Scratch

Every ML model you've built assumes you see the outcome — but what happens when patients drop out, customers haven't churned yet, or machines are still running? Build the complete survival analysis toolkit from scratch: Kaplan-Meier estimators with Greenwood confidence bands, the log-rank test, Cox Proportional Hazards with partial likelihood and Breslow baseline estimation, concordance index evaluation, and Weibull MLE with censoring. Two interactive demos let you explore survival curves with adjustable censoring and watch Cox PH coefficients reshape hazard in real time.

2026-02-28 Elementary AI

Program Synthesis from Scratch

Every Copilot suggestion is a program synthesizer at work. Build the entire progression from scratch: brute-force enumeration over AST grammars with observational equivalence pruning, CEGIS counterexample-guided refinement (the engine behind Excel's FlashFill), neural-guided search with a hand-trained MLP, and the LLM self-repair loop that powers modern coding assistants. Two interactive demos let you synthesize programs from I/O examples and watch CEGIS converge in real time.

2026-02-28 Elementary AI

Approximate Nearest Neighbors from Scratch

You have 50 million embeddings and 20ms to find the 10 most similar — brute force is physically impossible. Build locality-sensitive hashing, navigable small-world graphs (HNSW), and product quantization from first principles in NumPy, then race them head-to-head in two interactive demos that make the recall-latency-memory trilemma visceral.

2026-02-28 Elementary AI

Neuroevolution from Scratch

What if you could train neural networks the way nature trains organisms — through mutation, selection, and survival of the fittest? Evolve network weights with a simple GA, grow topology and weights simultaneously with NEAT's innovation numbers and speciation, compress millions of weights through HyperNEAT's CPPNs, and scale to industrial problems with evolution strategies. Two interactive demos let you watch a network evolve to solve XOR and race ES against gradient descent on a multi-modal landscape.

2026-02-28 Elementary AI

Differentiable Programming from Scratch

Backpropagation lets you optimize neural networks. Differentiable programming lets you optimize anything — sorting algorithms, physics simulators, 3D renderers, even database queries. Build two complete autodiff engines (forward-mode with dual numbers and reverse-mode with computational graphs), learn to differentiate through non-differentiable operations using the Straight-Through Estimator and soft relaxations, apply the Implicit Function Theorem to differentiate through solver solutions, push gradients through a physics simulator, and survey how JAX, PyTorch, and Zygote make it all composable.

2026-02-28 Elementary AI

World Models from Scratch

Model-free RL agents need millions of interactions to learn simple tasks. Humans just imagine the outcome. World Models bring this ability to artificial agents — a VAE compresses observations, an MDN-RNN predicts future states, and a tiny linear controller learns to act entirely inside learned "dreams." Build the full V-M-C pipeline from scratch, train a controller that never touches the real environment, and explore two interactive demos where you watch an agent learn to dream and plan.

2026-02-28 Elementary AI

Neural Processes from Scratch

Gaussian Processes give beautiful uncertainty but choke at O(n³). Neural networks scale effortlessly but give point predictions with no honesty about what they don't know. Neural Processes combine both — learning to map context points to full predictive distributions in a single forward pass. Build the CNP, Latent NP, and Attentive NP from scratch, train with episodic meta-learning, and explore two interactive demos where you click to add context and watch uncertainty shrink in real time.

2026-02-28 Elementary AI

Spiking Neural Networks from Scratch

Your brain runs on 20 watts. GPT-4 training used 50 GWh. The secret? Biological neurons communicate with precisely timed electrical spikes, not floating-point numbers. Build the LIF neuron model, encode data as spike trains, learn with STDP, and train deep SNNs with surrogate gradients. Two demos let you inject current into a spiking neuron and watch STDP reshape a synapse in real time.

2026-02-28 Elementary AI

Domain Adaptation from Scratch

Your model hits 97% accuracy on test data — then drops to 61% in production. The data changed, not the model. Build every major domain adaptation technique from Ben-David's theoretical bound through MMD, CORAL, and DANN with the gradient reversal layer. Two interactive demos let you watch CORAL recover accuracy on shifted data and see adversarial training force domain-invariant features in real time.

2026-02-28 Elementary AI

Multi-Task Learning from Scratch

What if solving more problems made your model better at each one? Build multi-task networks with hard parameter sharing, then tackle the real challenges: uncertainty weighting to balance loss scales, PCGrad to surgically remove gradient conflicts, and GradNorm to equalize training rates. Two interactive demos let you compare shared vs separate networks and visualize gradient surgery in a 2D loss landscape.

2026-02-28 Elementary AI

Symbolic Regression from Scratch

Neural networks approximate functions — but what if your model could hand you the actual equation? Build a full genetic programming engine from scratch: expression trees, tournament selection, subtree crossover, parsimony pressure, and linear scaling. Two interactive demos let you evolve expressions in real time and rediscover physics laws from noisy data with Pareto front visualization.

2026-02-28 Elementary AI

Curriculum Learning from Scratch: Teaching Neural Networks the Way Humans Learn

You learned to add before calculus — why do we train neural networks on shuffled data? Curriculum learning presents easy examples first, and there's deep theory behind it: continuation methods that smooth the loss landscape. Build self-paced learning from scratch, explore focal loss as anti-curriculum, implement forgetting events for data pruning, and watch two interactive demos compare curriculum vs random training in real time.

2026-02-28 Elementary AI

Mixture Density Networks from Scratch: When Neural Networks Output Probability Distributions Instead of Point Predictions

Standard neural networks predict the average — catastrophically wrong when multiple answers are valid. MDNs output Gaussian mixture parameters instead: means, variances, and mixing weights that describe the full conditional distribution. Build Bishop's 1994 architecture from scratch with the log-sum-exp trick, train on inverse problems, and explore two interactive demos — click to query conditional densities and watch a real-time training heatmap.

2026-02-28 Elementary AI

Kolmogorov-Arnold Networks from Scratch: Learnable Activations, B-Splines, and the End of Fixed Neurons

MLPs put fixed activations on neurons and learn the weights. KANs flip this — learnable B-spline activations live on edges, nodes just sum. Build KANs from the 1957 Kolmogorov-Arnold theorem through B-spline basis functions, residual SiLU edges, grid refinement training, and symbolic regression. Two interactive demos let you watch splines evolve during training and pit a KAN against an MLP on function approximation.

2026-02-28 Elementary AI

Byte-Level Models from Scratch: UTF-8 Encodings, ByT5, and the End of Tokenization

Type "hello" — 5 bytes, 1 token. Type "สวัสดี" — 18 bytes, 6 tokens. Same greeting, 6x the cost. Tokenization isn't a solved problem — it's a bottleneck. Build byte-level models from UTF-8 encoding through ByT5's pooling architecture, MegaByte's local-global patching, and MambaByte's linear-time scanning. Two interactive demos let you measure the tokenization tax across languages and watch three architectures scale from 64 to 2048 bytes.

2026-02-28 Elementary AI

Geometric Deep Learning from Scratch: Symmetry, Groups, and the Blueprint That Unifies CNNs, GNNs, and Transformers

CNNs, GNNs, and Transformers aren't three separate inventions — they're three consequences of one principle: respect the symmetry. Derive convolution from translation equivariance, message passing from permutation equivariance, and attention from set equivariance. Then extend to rotation-equivariant group convolutions and the five-component blueprint that generates any architecture from its symmetry group. Two interactive demos let you verify equivariance and explore the unified framework.

2026-02-28 Elementary AI

Neural ODEs from Scratch: When Depth Becomes Continuous and Networks Learn to Flow

What happens when you let a ResNet have infinitely many layers? You get a Neural ODE — a network whose forward pass solves a differential equation. Build ODE solvers from scratch, implement the adjoint method for O(1)-memory training, derive continuous normalizing flows with the Hutchinson trace estimator, and see why augmented Neural ODEs break the homeomorphism barrier. Two interactive demos let you watch continuous-depth classification and irregular time series modeling in action.

2026-02-28 Elementary AI

Energy-Based Models from Scratch: Boltzmann Machines, Contrastive Divergence, and Score Matching

Every probability distribution is an energy landscape — low energy means high probability. Build the framework that unifies generative AI: Hopfield networks as associative memory, RBMs with contrastive divergence, MCMC sampling via Langevin dynamics, and score matching that connects directly to modern diffusion models. Two interactive demos let you watch energy minimization in action.

2026-02-28 Elementary AI

Video Understanding from Scratch: Optical Flow, 3D Convolutions, and Video Transformers

A video of throwing and a video of catching contain identical frames in different order — temporal ordering is the signal. Build video understanding from first principles: frame differences, Lucas-Kanade optical flow, 3D convolutions (C3D, I3D, R(2+1)D), two-stream networks, divided space-time attention (TimeSformer, ViViT), and modern VideoMAE pretraining. Two interactive demos visualize flow fields and attention patterns.

2026-02-28 Elementary AI

Text-to-Speech from Scratch: Teaching Machines to Speak with Mel Spectrograms and Neural Vocoders

From phonemes to waveforms — build the complete TTS pipeline: text normalization, duration prediction, Tacotron-style mel generation, Griffin-Lim phase estimation, and neural vocoders like WaveNet and HiFi-GAN. Two interactive demos let you drag phoneme durations and watch Griffin-Lim iteratively reconstruct audio from magnitude spectrograms.

2026-02-28 Elementary AI

Neural Architecture Search from Scratch: Teaching Machines to Design Neural Networks

You hand-design a neural network and hope it's good enough. But what if a machine could search over billions of possible architectures automatically? Build NAS from first principles: cell-based search spaces, random search with successive halving, evolutionary architecture search, differentiable DARTS with bilevel optimization, weight-sharing supernets, and hardware-aware multi-objective search — with interactive demos that let you explore architecture DAGs and watch three search strategies race across a fitness landscape.

2026-02-28 Elementary AI

Speech Recognition from Scratch: From Sound Waves to Text with CTC and Attention

You say "Hey Siri, set a timer" and 0.8 seconds of air pressure waves become text. Build the complete ASR pipeline from first principles: mel spectrograms, the CTC forward algorithm for alignment-free training, greedy and beam search decoding with prefix merging, attention-based encoder-decoders, Word Error Rate evaluation, and modern architectures like Whisper and Conformer — with interactive demos that let you explore the CTC trellis and watch beam search outperform greedy decoding in real time.

2026-02-28 Elementary AI

Knowledge Graphs from Scratch

Google "who directed Inception" and a Knowledge Panel appears instantly — that's a knowledge graph with billions of (entity, relation, entity) triples. Build the complete KG toolkit from first principles: graph construction, TransE translation embeddings, DistMult and ComplEx bilinear models, link prediction evaluation, multi-hop reasoning, and the convergence of KGs with LLMs — with interactive demos that let you explore a live knowledge graph and watch embedding geometry reshape under training.

2026-02-28 Elementary AI

Dense Retrieval from Scratch

Search for "leaking faucet" and the best result says "stopping drips from fixtures" — zero keyword overlap. Build the entire dense retrieval stack from first principles: BM25 baselines, bi-encoder architecture, InfoNCE contrastive training, hard negative mining, product quantization for billion-scale search, and ColBERT's MaxSim late interaction — with interactive demos that let you explore embedding space and watch contrastive learning reshape it.

2026-02-28 Elementary AI

Federated Learning from Scratch: Training Models Without Sharing Data

Five hospitals each hold 10,000 MRI scans. HIPAA says they can't pool them. Federated learning says they don't have to. Build the complete FL toolkit from first principles: FedAvg, communication-efficient sparsification with error feedback, FedProx for non-IID data, secure aggregation via pairwise masking, and DP-FedAvg for formal privacy guarantees — with interactive demos that let you run federated training across virtual clients and explore how label skew affects convergence.

2026-02-28 Elementary AI

Continual Learning from Scratch

Train a neural network on cats vs dogs — 95% accuracy. Now train it on cars vs trucks — 93%. Test it on cats vs dogs again: 52%. This is catastrophic forgetting. Build the complete continual learning toolkit from first principles: EWC, experience replay with reservoir sampling, PackNet, Learning without Forgetting, and evaluation metrics — with interactive demos that let you watch forgetting happen in real time and see how each defense preserves knowledge.

2026-02-28 Elementary AI

Differential Privacy from Scratch

Anonymization is fundamentally broken — Netflix, AOL, and hospital records have all been re-identified. Build the complete differential privacy toolkit from first principles: randomized response, Laplace and Gaussian mechanisms, composition theorems, DP-SGD for private deep learning, and the exponential mechanism — with interactive demos that let you spend a privacy budget on real queries and watch DP-SGD train in real time.

2026-02-28 Elementary AI

Audio Features from Scratch: From Sound Waves to Spectrograms, MFCCs, and Neural Audio

Every song and every spoken word is just a list of integers. Build the complete audio feature pipeline from scratch — Fourier transforms, spectrograms, mel filterbanks, MFCCs — with interactive demos that let you hear signals, visualize the time-frequency tradeoff, and watch the MFCC extraction pipeline step by step.

2026-02-28 Elementary AI

Semantic Segmentation from Scratch: Classifying Every Pixel in an Image

Object detection draws rectangles. Semantic segmentation labels every single pixel. Build the complete segmentation pipeline from scratch — FCN, U-Net skip connections, dilated convolutions, Dice loss — with interactive demos comparing FCN's blurry predictions to U-Net's sharp boundaries and watching Dice loss outperform cross-entropy on imbalanced data.

2026-02-28 Elementary AI

Object Detection from Scratch: Finding and Labeling Every Object in an Image

Image classification says "cat." Object detection says "cat at (120, 45, 280, 190)." Build the complete detection pipeline from scratch — IoU, anchor boxes, NMS, YOLO's single-shot grid, anchor-free FCOS, and focal loss — with interactive demos for dragging bounding boxes and watching a detector learn in real time.

2026-02-28 Elementary AI

Data Augmentation from Scratch: Training Better Models with the Data You Already Have

Deep learning is hungry for data, but what if you could train better models without collecting more? Build the complete data augmentation toolkit from first principles — geometric transforms, color jitter, Mixup, CutMix, RandAugment, and text augmentation — with interactive demos showing how augmentation tames overfitting.

2026-02-27 Elementary AI

Active Learning from Scratch: Teaching Your Model to Ask the Right Questions

You have 100,000 unlabeled examples and a budget for 500 labels — which 500 should you pick? Build active learning from first principles: implement uncertainty sampling, query-by-committee, expected gradient length, and batch diversity strategies, then explore failure modes and modern LLM applications — with interactive demos pitting an active learner against random selection.

2026-02-27 Elementary AI

Implicit Bias of Gradient Descent from Scratch: Why Your Optimizer Is Secretly a Regularizer

Train an overparameterized model with no explicit regularization — yet it generalizes. Build implicit bias theory from scratch: prove GD finds minimum-norm solutions in linear regression, derive max-margin convergence for logistic loss, show depth induces low-rank bias via matrix factorization, demonstrate the edge of stability where sharpness self-stabilizes at 2/η, and explain why small-batch SGD finds flatter minima — with interactive demos for a minimum-norm explorer and a flat-vs-sharp minima visualizer.

2026-02-27 Elementary AI

Neural Tangent Kernels from Scratch: Why Infinitely Wide Networks Are Just Kernel Machines

A network with 10 million parameters fits 1,000 points perfectly — yet generalizes. Build Neural Tangent Kernel theory from scratch: derive the NTK as a Jacobian Gram matrix, prove the infinite-width convergence to a deterministic kernel, analyze training dynamics with exponential loss decay and spectral bias, measure the lazy-vs-rich regime transition across widths, and compute the analytic arccosine NTK recursion — with interactive demos for an empirical NTK explorer with eigenvalue spectrum and a lazy-vs-rich training visualizer.

2026-02-27 Elementary AI

Second-Order Optimization from Scratch: Beyond Gradient Descent with Curvature Information

Gradient descent treats all directions equally — but loss landscapes have curvature. Build second-order optimization from scratch: derive Newton's method and its quadratic convergence, implement L-BFGS two-loop recursion, compute natural gradients via the Fisher information matrix, approximate curvature with K-FAC for deep networks, and get Hessian-vector products for free — with interactive demos for an optimizer trajectory arena and a curvature visualizer.

2026-02-27 Elementary AI

Online Learning from Scratch: Making Decisions One at a Time with Regret Guarantees

Most ML assumes your data is i.i.d. What if it's adversarial? Build online learning from first principles — derive multiplicative weights with optimal regret bounds, implement online gradient descent with projection, unify everything under the Follow-the-Regularized-Leader framework, prove the online-to-batch conversion, and add AdaGrad's per-coordinate adaptivity — with interactive demos for an expert advice arena with adversarial modes and an online vs batch decision boundary visualizer.

2026-02-27 Elementary AI

Kernel Methods from Scratch: The Trick That Lets Linear Models Learn Nonlinear Patterns

Your data lives in 2D but the decision boundary is a circle. The fix: map to a higher-dimensional space where a hyperplane works. But what if that space is infinite-dimensional? Build the kernel trick from first principles — prove polynomial kernels compute exact feature-space dot products, verify Mercer's theorem via Gram matrix eigenvalues, compare six kernels side by side, compose custom kernels with algebraic closure properties, and kernelize ridge regression and PCA — with interactive demos for a kernel PCA feature-space visualizer and a Gram matrix explorer with eigenvalue spectrum.

2026-02-27 Elementary AI

Spectral Clustering from Scratch: Using Eigenvalues to Find Hidden Structure

K-means slices a straight line through your concentric rings and calls it a day. Build spectral clustering from scratch: construct RBF similarity graphs with three sparsification strategies, compute the graph Laplacian and its normalized variant, extract eigenvectors that unfold non-convex shapes into linearly separable embeddings, and run k-means in eigenspace with NJW row-normalization — with interactive demos for a 5-stage spectral clustering pipeline with σ tuning and a k-means vs spectral side-by-side accuracy arena.

2026-02-27 Elementary AI

Conditional Random Fields from Scratch: Structured Prediction Beyond Independent Labels

Your classifier tags each word independently and gets "mat" wrong because it never looks at neighboring predictions. Build a linear-chain CRF from scratch with emission and transition potentials, implement the forward algorithm for partition function computation, Viterbi decoding for optimal sequences, and forward-backward for gradient-based training — with interactive demos for a CRF sequence tagger with animated DP table and a transition matrix explorer showing how pairwise weights reshape predictions.

2026-02-27 Elementary AI

Semi-Supervised Learning from Scratch: Extracting Supervision from Unlabeled Data

You have 50 labeled images and 50,000 unlabeled ones. Build five semi-supervised methods from scratch: self-training with pseudo-labels, label propagation through RBF similarity graphs, Π-Model consistency regularization, entropy minimization, and MixMatch combining augmentation averaging, temperature sharpening, and MixUp — with interactive demos for a label propagation explorer and a supervised vs semi-supervised decision boundary comparison.

2026-02-27 Elementary AI

Feature Selection from Scratch: Finding the Signal in a Sea of Variables

You add 50 new features expecting better predictions — instead, accuracy drops. Build six feature selection methods from scratch: filter methods with mutual information, mRMR for redundancy-aware selection, forward selection and RFE wrappers, Lasso coordinate descent with regularization paths, permutation importance, and stability selection via bootstrapping — with interactive demos for a three-method importance arena and a curse of dimensionality visualizer.

2026-02-27 Elementary AI

Optimal Transport from Scratch: Moving Probability Mass at Minimum Cost

You have two piles of sand and want to reshape one into the other at minimum cost. This 200-year-old problem — from Monge's military logistics to Kantorovich's Nobel Prize — turned out to be exactly what modern ML needed. Build the Monge assignment problem, Kantorovich LP relaxation, Sinkhorn's algorithm, and Wasserstein barycenters from scratch — with interactive demos for transport plan visualization and a Wasserstein vs KL distance explorer.

2026-02-27 Elementary AI

Hierarchical Clustering from Scratch: Building Dendrograms That Reveal Data's Hidden Tree Structure

K-Means makes you pick k upfront. DBSCAN can't tell you which clusters are more similar to each other. Hierarchical clustering builds a dendrogram — a binary tree encoding every possible grouping at once. Build agglomerative clustering with all four linkage criteria, explore the Lance-Williams recurrence, and discover why single linkage equals the minimum spanning tree — with interactive demos for step-by-step merging and a four-way linkage showdown.

2026-02-27 Elementary AI

Conformal Prediction from Scratch: Distribution-Free Uncertainty with Guaranteed Coverage

MC Dropout and Deep Ensembles give useful uncertainty estimates — but no formal guarantees. Conformal prediction flips the question: what prediction sets are mathematically guaranteed to contain the true answer? Build split conformal methods, adaptive prediction sets, and conformalized quantile regression from first principles — with interactive demos for prediction set exploration and live coverage guarantee verification.

2026-02-27 Elementary AI

Uncertainty Quantification from Scratch: Teaching Neural Networks to Say "I Don't Know"

Your classifier reports 99% confidence — and is completely wrong. Softmax outputs aren't probabilities; they're normalized logits that grow unboundedly away from the decision boundary. Build reliability diagrams, MC Dropout, Deep Ensembles, and temperature scaling from first principles, and learn when to trust your model — with interactive demos for uncertainty heatmaps and live calibration tuning.

2026-02-27 Elementary AI

Meta-Learning from Scratch: Teaching Neural Networks to Learn New Tasks from Just a Few Examples

Show a child five characters from an alien alphabet and they'll start recognizing new ones within minutes. Standard neural networks need thousands of examples. Build prototypical networks and MAML from first principles, discover how episode-based training teaches models to learn from few examples, and connect it all to in-context learning in modern LLMs — with interactive demos for few-shot classification and real-time sinusoid adaptation.

2026-02-27 Elementary AI

Residual Networks from Scratch: Why Deeper Networks Need Shortcuts

Before ResNets, training networks deeper than 20 layers consistently failed — not from overfitting, but from a mysterious degradation problem. Build residual blocks, projection shortcuts, and bottleneck architectures from scratch, and discover why the skip connection is the single most important idea enabling modern deep learning — with interactive demos showing degradation in action and gradient flow through skip paths.

2026-02-27 Elementary AI

Adversarial Examples from Scratch: How Invisible Perturbations Fool Neural Networks

Add noise invisible to the human eye and a neural network classifies a panda as a gibbon with 99% confidence. Build FGSM and PGD attacks from first principles, discover why high-dimensional linearity makes every model vulnerable, and train adversarially robust networks — with interactive demos that let you craft attacks and watch decision boundaries shift.

2026-02-27 Elementary AI

Double Descent from Scratch: Why Bigger Models Generalize Better (And Classical Statistics Got It Wrong)

Every textbook teaches the bias-variance tradeoff: bigger models overfit. But GPT-4 has trillions of parameters and generalizes beautifully. Build the double descent curve from polynomial regression through neural networks, see how regularization masks the interpolation peak, and understand why scaling works — with interactive demos that let you explore all three regimes.

2026-02-27 Elementary AI

Normalizing Flows from Scratch: Invertible Neural Networks That Generate by Transforming

VAEs approximate likelihood, GANs abandon it entirely — but normalizing flows compute the exact probability of every data point. Build RealNVP coupling layers, Glow's 1×1 convolutions, and autoregressive flows from first principles, with interactive demos that let you scrub through flow transformations and explore learned densities.

2026-02-27 Elementary AI

Kalman Filter from Scratch: Predicting the Future by Trusting (But Verifying) Noisy Sensors

Every sensor lies — but the Kalman filter reconstructs truth from noise using elegant matrix algebra. Build the KF, EKF, and sensor fusion from scratch, with interactive demos that let you track objects and visualize how Gaussian fusion always reduces uncertainty.

2026-02-27 Elementary AI

DBSCAN from Scratch: When K-Means Fails and Density Saves the Day

K-Means bulldozes through non-convex shapes. Build DBSCAN from scratch with BFS cluster expansion, k-distance elbow method, KD-tree acceleration, and HDBSCAN's parameter-free hierarchical clustering — with interactive demos that let you compare all three algorithms on the same data.

2026-02-27 Elementary AI

Bandit Algorithms from Scratch: The Explore-Exploit Dilemma That Powers Modern AI

Should you exploit what works or explore something new? Build multi-armed bandit algorithms from scratch — greedy, epsilon-greedy, UCB1, and Thompson Sampling — with interactive demos that let you watch algorithms learn in real time.

2026-02-27 Elementary AI

K-Nearest Neighbors from Scratch: The Algorithm That Lets the Data Speak for Itself

No parameters, no training loop, no assumptions — just find the closest examples and let them vote. Build KNN from first principles with distance metrics, the curse of dimensionality, KD-trees, and interactive demos that let you paint decision boundaries in real time.

2026-02-27 Elementary AI

Hidden Markov Models from Scratch: Teaching Machines to Read Between the Lines

When the real signal is hidden, you need algorithms that reason under uncertainty. Build HMMs from absolute first principles — Markov chains, the Forward algorithm, Viterbi decoding, Backward posteriors, and Baum-Welch learning — with interactive demos that animate state transitions and trellis decoding step by step.

2026-02-27 Elementary AI

Linear Regression from Scratch: The Algorithm That Launched a Thousand Models

Every ML journey starts here. Build linear regression from absolute first principles — closed-form solutions, gradient descent, polynomial features, Ridge and Lasso regularization — with interactive demos that let you fit lines and watch coefficients shrink in real time.

2026-02-27 Elementary AI

Expectation-Maximization from Scratch

K-means assumes every cluster is a sphere — real data is messier. Build the EM algorithm from scratch: soft assignments via Bayes' rule, weighted parameter updates, monotonic convergence via the ELBO, and Gaussian Mixture Models that capture elliptical clusters k-means can't.

2026-02-27 Elementary AI

t-SNE from Scratch: Visualizing High-Dimensional Data

PCA finds the best linear projection — but most interesting structure isn't linear. Build t-SNE from scratch: convert distances to probabilities, solve the crowding problem with Student-t distributions, and minimize KL divergence to produce those beautiful 2D cluster plots.

2026-02-27 Elementary AI

Monte Carlo Methods from Scratch

In 1946, Ulam couldn't solve solitaire analytically — so he played 100 games and counted. Build Monte Carlo methods from scratch: π estimation, importance sampling, rejection sampling, and MCMC — the random sampling toolkit that powers all of modern probabilistic ML.

2026-02-27 Elementary AI

Bayesian Optimization from Scratch

Grid search takes 7 years. Bayesian optimization finds near-optimal hyperparameters in 20 evaluations. Build BO from scratch — GP surrogates, acquisition functions (EI, UCB, PI), and the sequential optimization loop that powers modern hyperparameter tuning.

2026-02-27 Elementary AI

Gaussian Processes from Scratch

Most ML models give you a prediction and shrug. Gaussian processes give you a prediction and a confidence interval — they know what they don't know. Build GPs from scratch with kernels, Cholesky decomposition, and Bayesian inference.

2026-02-27 Elementary AI

Recommender Systems from Scratch

Netflix's recommendation engine is worth $1 billion per year. Build the algorithms behind it from scratch — collaborative filtering, matrix factorization, content-based filtering, and neural methods — to understand how machines predict what you'll love.

2026-02-27 Elementary AI

Anomaly Detection from Scratch

Knight Capital lost $440 million in 45 minutes from a single bug. Build four anomaly detectors from scratch — z-scores, k-NN, Local Outlier Factor, and Isolation Forest — and learn when each one wins.

2026-02-27 Elementary AI

Causal Inference from Scratch

Ice cream sales correlate with drowning deaths — should we ban ice cream? Build the math of cause-and-effect from Simpson's paradox through potential outcomes, causal graphs, propensity scores, difference-in-differences, and instrumental variables.

2026-02-27 Elementary AI

Self-Supervised Learning from Scratch

ImageNet took 25,000 workers two years to label. GPT-4 trained on trillions of unlabeled tokens. Build masked language modeling, masked autoencoders, BYOL, and DINO from scratch — and discover why creating your own labels beats human annotation.

2026-02-27 Elementary AI

Genetic Algorithms from Scratch

Gradient descent needs gradients. Evolution doesn't. Build genetic algorithms from scratch — selection, crossover, mutation — then solve the Traveling Salesman Problem, evolve neural network weights without backprop, and explore CMA-ES.

2026-02-27 Elementary AI

Bayesian Inference from Scratch

Your model gives you one answer. Bayesian inference gives you every plausible answer and how much to trust each one. Build from Bayes' theorem through conjugate priors, MAP estimation, and MCMC sampling to Bayesian deep learning.

2026-02-27 Elementary AI

Information Theory from Scratch

Every loss function speaks the same language — surprise. From Shannon's 1948 insight through entropy, cross-entropy, KL divergence, and perplexity, discover the mathematical thread connecting every algorithm in modern AI.

2026-02-27 Elementary AI

Logistic Regression from Scratch

The most important algorithm never given its own post. A single neuron IS logistic regression. Build it from maximum likelihood, derive the elegant gradient, extend to multi-class softmax, and watch the exact moment it becomes a neural network.

2026-02-27 Elementary AI

Naive Bayes from Scratch

A theorem from 1763 still powers every spam filter you've ever used. Build three Naive Bayes variants from scratch — Gaussian, Multinomial, and Bernoulli — and discover why an algorithm built on a provably wrong assumption consistently embarrasses models a hundred times more complex.

2026-02-27 Elementary AI

Decision Trees & Random Forests from Scratch

Decision trees are the only ML model you can literally read like a flowchart. Build CART from scratch, learn why unpruned trees memorize noise, then grow a Random Forest and discover how feature subsampling turns weak learners into one of ML's most reliable algorithms.

2026-02-27 Elementary AI

PCA from Scratch

Karl Pearson published the most important technique in multivariate statistics in a philosophy magazine in 1901. Build PCA from scratch via eigendecomposition and SVD, visualize principal components collapsing a point cloud, and discover when linear projections fail with kernel PCA, t-SNE, and UMAP side-by-side.

2026-02-26 Elementary AI

K-Means Clustering from Scratch

What if your data has no labels at all? Build K-Means clustering from scratch — Lloyd's algorithm, K-Means++ smart initialization, silhouette scores, and DBSCAN for when K-Means fails. Watch centroids converge step-by-step in interactive demos.

2026-02-26 Elementary AI

Support Vector Machines from Scratch

Forget gradient descent on a loss — SVMs find the decision boundary with the widest possible margin. Build hard-margin, soft-margin, and kernelized SVMs from scratch, and discover the kernel trick: computing in infinite-dimensional spaces without ever going there.

2026-02-26 Elementary AI

ML Evaluation from Scratch

Your model got 94% accuracy — but is that actually good? Build proper evaluation from scratch: stratified splits, k-fold cross-validation, metrics beyond accuracy, and statistical significance tests that reveal whether your "improvement" is real or just noise.

2026-02-26 Elementary AI

Gradient Boosting from Scratch

The algorithm that still dominates Kaggle and production tabular ML isn't a neural network — it's gradient boosting. Build decision trees, random forests, and XGBoost from scratch, and discover why fitting trees to other trees' mistakes is gradient descent in function space.

2026-02-26 Elementary AI

Time Series Forecasting from Scratch

Unlike NLP where transformers dominate, time series has a richer landscape where simple methods regularly beat deep learning. Build five forecasting methods from moving averages to temporal transformers, and discover why the M-competitions keep proving that less is often more.

2026-02-26 Elementary AI

Instruction Tuning from Scratch

Your pre-trained model knows everything but can't follow a single instruction. Instruction tuning is the step that transforms a raw text predictor into a helpful assistant — and it works with shockingly little data. Build SFT from scratch and discover why 1,000 perfect examples beat 50,000 mediocre ones.

2026-02-26 Elementary AI

Mechanistic Interpretability from Scratch

We've built 48 posts teaching how to build neural networks — now let's open the hood and see what they actually learn inside. Build the core interpretability toolkit from scratch: superposition models, probing classifiers, activation patching, the logit lens, attention head taxonomy, and sparse autoencoders for feature extraction.

2026-02-26 Elementary AI

Learning Rate Schedules from Scratch

The same model trains perfectly or fails completely based on a single curve. Build every major LR schedule from scratch — constant, step decay, cosine annealing with warm restarts, warmup + cosine (the GPT recipe), and cyclical rates. Interactive demos let you explore schedules on a 2D loss landscape and run the LR range test to find the sweet spot automatically.

2026-02-26 Elementary AI

Neural Network Pruning from Scratch

Your neural network is 90% dead weight — literally. Prune 70% of weights and lose barely 1% accuracy. Build magnitude pruning, structured vs unstructured approaches, the Lottery Ticket Hypothesis (sparse subnetworks that match dense accuracy), gradual cubic-schedule pruning, and modern one-shot methods like Wanda. Complete the compression trinity: quantization reduces bits, pruning removes weights, distillation shrinks architecture.

2026-02-26 Elementary AI

Model Merging from Scratch: Combining Neural Networks Without Retraining

Two models trained on different tasks — average their weights and get one model that does both? It sounds like it shouldn't work, but it does. Build every major merging technique from scratch: LERP, SLERP, task arithmetic, TIES-Merging, and DARE. Explore why the loss landscape makes it all possible, with interactive demos to drag merge points and compare methods.

2026-02-26 Elementary AI

DPO from Scratch: Training LLMs with Human Preferences Without RL

RLHF works but it's a nightmare — four models, PPO instability, reward hacking. DPO eliminates all of it with a single mathematical identity: the reward model is redundant. Derive the DPO loss step by step, implement it from scratch in NumPy, compare it head-to-head with RLHF, and explore the preference optimization zoo: IPO, KTO, and ORPO.

2026-02-26 Elementary AI

Test-Time Compute from Scratch: How Models Think Longer to Think Better

We spent 42 posts on training-time scaling — but in 2024, a second axis emerged: give models more time to think. A smaller model reasoning for 60 seconds outperforms one 14× larger answering instantly. Build the full stack from scratch: chain-of-thought as compute, Best-of-N verification, MCTS tree search over reasoning steps, and DeepSeek-R1's GRPO that teaches models to think via pure reinforcement learning.

2026-02-26 Elementary AI

Attention Variants from Scratch: GQA, MQA, and Why Modern LLMs Share Heads

Your attention implementation works perfectly — and it's unusable in production. Every deployed LLM uses variants that trade KV cache memory for quality: MQA shares one K,V across all heads, GQA finds the sweet spot with groups, and sliding window attention makes 128K contexts possible. Build each from scratch with NumPy and see the Pareto frontier of quality vs efficiency.

2026-02-26 Elementary AI

In-Context Learning from Scratch: How LLMs Learn Without Updating a Single Weight

When you give an LLM a few examples in a prompt and it learns the pattern, no weights change — the model implements a learning algorithm inside its forward pass. Discover how attention literally performs gradient descent, how Anthropic's induction heads form through a sharp phase transition, and how task vectors compress demonstrations into a single direction in activation space.

2026-02-26 Elementary AI

Backpropagation from Scratch: How Neural Networks Learn by Going Backwards

Every neural network ever trained learned through the same algorithm — not gradient descent, not the loss function, but backpropagation. Derive it from the chain rule, implement manual forward and backward passes through a 3-layer MLP, verify against numerical gradients to 12 decimal places, and watch interactive demos reveal why vanishing gradients kill deep sigmoid networks while ReLU and residual connections save them.

2026-02-26 Elementary AI

Neural Scaling Laws from Scratch: Why Bigger Models Predictably Win

Before GPT-4 trained a single token, its creators knew roughly how well it would perform — from an equation. Derive the power laws governing model performance, reproduce Kaplan's three scaling axes, understand how Chinchilla corrected the compute-optimal allocation, and explore the LLaMA over-training revolution. Interactive demos let you explore loss surfaces and watch Kaplan vs Chinchilla optimal points diverge as compute grows.

2026-02-26 Elementary AI

Sparse Autoencoders from Scratch: The Microscope for Neural Networks

Neural networks pack more concepts than they have neurons — a trick called superposition that makes individual neurons uninterpretable. Sparse autoencoders reverse this by expanding into an overcomplete dictionary with sparsity constraints, forcing each feature to represent one clean concept. Build SAEs from scratch, watch polysemantic neurons become monosemantic features, and understand how Anthropic discovered interpretable concepts like "Golden Gate Bridge" inside Claude.

2026-02-26 Elementary AI

Flow Matching from Scratch: The Simpler Path from Noise to Data

Diffusion models need a noise schedule, a Markov chain, and hundreds of steps. Flow matching draws a straight line from noise to data — learn a velocity field, solve an ODE, done. Build conditional flow matching and rectified flow from scratch, compare side-by-side with diffusion, and watch interactive demos show why straighter paths need fewer steps.

2026-02-26 Elementary AI

Diffusion Models from Scratch: How AI Learns to Draw by Undoing Noise

The best way to create an image is to learn how to destroy one. Build the complete diffusion pipeline from scratch — the forward process that turns images into static, the denoising U-Net that reverses it, and both DDPM and DDIM sampling algorithms. Then see how Stable Diffusion scales it all up with latent space compression and CLIP-guided text conditioning. Interactive demos let you watch noise destroy and recreate a Swiss roll, and explore why cosine schedules beat linear ones.

2026-02-26 Elementary AI

Flash Attention from Scratch: Why the Fastest Attention Algorithm Never Materializes the Attention Matrix

Standard attention's dirty secret isn't math — it's memory. For a 32K-token sequence, it writes a billion-element matrix per head per layer. Flash Attention computes exact attention without ever storing it, using IO-aware tiling and the online softmax trick. Build it from scratch in NumPy, prove it's bit-identical to naive attention, and watch interactive demos visualize the 10x memory traffic reduction that made 100K+ token contexts possible.

2026-02-26 Elementary AI

Rotary Position Embeddings from Scratch: How Modern LLMs Know Where Words Are

Every modern LLM uses RoPE — but how does rotating vectors encode position? Build RoPE from scratch in NumPy, prove why relative positions emerge from dot products, explore the complex number interpretation, and implement the scaling tricks (PI, NTK, YaRN) that push context windows to 128K+ tokens. Interactive demos let you rotate query and key vectors and visualize frequency bands across scaling strategies.

2026-02-26 Elementary AI

Activation Functions from Scratch: Why Every Neuron Needs a Plot Twist

Prove that neural networks without activations collapse to a single linear layer, then build every major activation function from scratch in NumPy — from sigmoid's vanishing gradients through the ReLU revolution to GELU's smooth probabilistic gating. A controlled training experiment races 8 activations head-to-head, and interactive demos let you overlay any functions and watch gradient flow vanish (or survive) through a 6-layer network.

2026-02-26 Elementary AI

CLIP from Scratch: Teaching Machines to See and Read at the Same Time

Build CLIP from scratch — the model that powers Stable Diffusion, zero-shot classification, and multimodal search. Construct dual encoders, derive the symmetric contrastive loss over an NxN similarity matrix, and implement zero-shot image classification in 10 lines of code. Interactive embedding explorer shows how images and text cluster in the shared space.

2026-02-25 Elementary AI

Graph Neural Networks from Scratch: How AI Learns to Reason About Relationships

The series built models for grids (CNNs), sequences (transformers, RNNs, SSMs), and unstructured data — but missed the most general data structure: graphs. Build GCN, GraphSAGE, and GAT from pure NumPy using the universal message passing framework. Three interactive demos let you watch messages flow through a graph, explore learned attention weights on edges, and see over-smoothing destroy diversity as you stack layers. The architecture that makes transformers and CNNs special cases.

2026-02-25 Elementary AI

Reinforcement Learning from Scratch: Teaching Machines to Learn from Consequences

The series covered supervised learning (labels) and unsupervised learning (patterns) — but missed the third paradigm: learning from consequences. Build MDPs, Q-learning, policy gradients, and actor-critic methods from pure NumPy. Three interactive demos let you watch a gridworld agent's Q-values converge in real time, race exploration strategies on a multi-armed bandit, and train a CartPole balancer with REINFORCE vs Actor-Critic. The foundation that makes RLHF possible.

2026-02-25 Elementary AI

State Space Models from Scratch: How Mamba Learned to Rival Transformers

The elementary series built RNNs (sequential, O(n)) and transformers (parallel, O(n²)) — but missed the architecture that delivers both: linear-time processing that parallelizes during training. Build the complete SSM pipeline from control theory ODEs through discretization, the convolution trick, HiPPO's optimal memory, and Mamba's selective scan — all from pure NumPy. Three interactive demos let you toggle between continuous, discrete, and convolution views of the same model, explore Mamba's content-aware selection mechanism, and watch attention's O(n²) cost explode against SSM's O(n) line.

2026-02-25 Elementary AI

Contrastive Learning from Scratch: How AI Learns to See Without Labels

Before CLIP, before DINO, before foundation models — there was a simple idea: teach a network that two views of the same image should be similar, and everything else should be different. Build SimCLR, derive InfoNCE loss, implement CLIP's cross-modal training, and explore DINO's self-distillation — all from pure NumPy. Three interactive demos let you visualize augmentation pairs, drag points on the InfoNCE loss landscape, and watch a mini-CLIP learn zero-shot classification.

2026-02-25 Elementary AI

Generative Adversarial Networks from Scratch: How Neural Networks Learned to Create by Competing

Pit two neural networks against each other — a Generator that forges data and a Discriminator that spots fakes — and watch the forger become a master through competition alone. Build the complete GAN framework from pure NumPy, derive the minimax objective, confront mode collapse, then fix it with Wasserstein distance. Three interactive demos let you watch 1D training converge in real time, toggle between GAN and WGAN on a mode collapse task, and walk through a generator's learned latent space.

2026-02-25 Elementary AI

Autoencoders & VAEs from Scratch: How Neural Networks Learn to Compress and Imagine

Build autoencoders and Variational Autoencoders from pure NumPy. Compress images through a bottleneck, explore why vanilla autoencoders can't generate new data, then add the reparameterization trick and KL divergence to create smooth, sampleable latent spaces. Interactive demos let you click anywhere in the latent space to decode images, tune the β regularization knob, and generate novel digits from random noise. The missing link to latent diffusion.

2026-02-25 Elementary AI

Recurrent Neural Networks from Scratch: How Machines Learned to Remember

Build RNNs, LSTMs, and GRUs from pure NumPy — hidden states, gating mechanisms, and backpropagation through time. Watch the vanishing gradient problem destroy vanilla RNN memory in real time, then see how LSTM's cell state highway fixes it. Interactive demos visualize gradient flow across 50 timesteps, trace hidden state evolution character by character, and race RNN sequential processing against transformer parallelism.

2026-02-25 Elementary AI

Convolutional Neural Networks from Scratch: How Machines Learned to See Before Transformers

Build a CNN from pure NumPy — convolutions, pooling, feature hierarchies, and a full LeNet architecture. Understand the four inductive biases that made CNNs dominate vision for a decade, and see exactly what Vision Transformers had to beat. Interactive demos animate convolution kernels sliding across images, visualize feature maps at each layer, and race CNN receptive fields against ViT's instant global attention.

2026-02-25 Elementary AI

Vision Transformers from Scratch: Your Transformer Already Understands Images

The same transformer you built for language works for images — no convolutions needed. Split images into patches, treat them as tokens, and reuse every component from the series. Build a complete ViT with patch embedding, [CLS] token, bidirectional attention, and a classification head. Interactive demos let you patchify images, explore attention heatmaps, and watch position embeddings discover 2D structure.

2026-02-25 Elementary AI

Diffusion Models from Scratch: How AI Learns to Denoise the Universe

The transformer series built language — now enter the visual frontier. Add Gaussian noise until data is pure static, then train a network to reverse each step. Build DDPM, DDIM fast sampling, and classifier-free guidance on 2D toy data where you can see everything happen.

2026-02-25 Elementary AI

Knowledge Distillation from Scratch: Teaching Small Models Everything a Big Model Knows

A 70B teacher knows "73% cat, 22% lynx, 4% tiger" — but fine-tuning on its outputs throws away everything except "cat." Build Hinton's distillation loss from scratch, derive the T² gradient scaling, implement three student training regimes, and discover why dark knowledge makes small models punch above their weight.

2026-02-25 Elementary AI

RLHF from Scratch: How Language Models Learn What Humans Want

A perfect next-token predictor is not a useful assistant. Build the complete RLHF pipeline from scratch — SFT to teach format, reward models with Bradley-Terry preferences, PPO's clipped surrogate objective, and DPO's elegant partition-function cancellation that collapses it all into one loss. Interactive preference arena where you train a reward model with your clicks.

2026-02-25 Elementary AI

Speculative Decoding from Scratch: How LLMs Generate Text 2-3x Faster

Your GPU sits at 0.6% utilization during text generation. Speculative decoding fixes this — use a small draft model to propose tokens, verify them all in one pass with the big model, and get 2-3x speedup with mathematically identical output. Build the full algorithm including rejection sampling proof.

2026-02-25 Elementary AI

Mixture of Experts from Scratch: How One Transformer Becomes Eight

Every token through every parameter? Not anymore. Build a complete MoE layer in NumPy — the router, top-k selection, load balancing loss, and expert dispatch — then explore DeepSeek-V3's shared experts, fine-grained routing, and auxiliary-loss-free balancing. Interactive routing visualizer shows collapse vs. balanced assignment.

2026-02-25 Elementary AI

The Complete Transformer from Scratch: Assembling Every Piece We've Built

The capstone: assemble all 14 components — tokenization, embeddings, positions, attention, normalization, FFN, softmax, and more — into a complete 222K-parameter GPT-style transformer that trains on a CPU and generates text. Interactive Transformer X-Ray lets you watch data flow through every stage.

2026-02-25 Elementary AI

Feed-Forward Networks from Scratch: The Other Half of Every Transformer Block

Two-thirds of a transformer's parameters live in the FFN, not attention. Build the classic FFN, every major activation function (ReLU, GELU, SiLU), and SwiGLU — the exact architecture powering LLaMA and Mistral — then discover why FFN layers are really massive key-value memory banks.

2026-02-25 Elementary AI

Normalization from Scratch: Why Every Transformer Layer Needs a Reset Button

Without normalization, activations explode through 50 layers. Build BatchNorm, LayerNorm, and RMSNorm from pure math, discover why BatchNorm fails for transformers, and learn why Pre-Norm RMSNorm conquered modern LLMs — with an interactive signal flow visualizer.

2026-02-25 Elementary AI

Quantization from Scratch: How LLMs Shrink to Fit Your GPU

A 7B model needs 14 GB in float16 — but what if you could cut that to 4 GB? Build symmetric, asymmetric, and NormalFloat quantization from pure math. Implement GPTQ and QAT, discover why 4-bit is the sweet spot, and explore it all in an interactive quantization playground.

2026-02-25 Elementary AI

LoRA from Scratch: Fine-Tuning Without Retraining Everything

Full fine-tuning a 7B model needs 60 GB of GPU memory. LoRA does it with 0.1% of the parameters. Build the low-rank decomposition from pure math and NumPy, train a LoRA network, merge it at zero cost, and explore QLoRA — with an interactive rank slider that reveals why weight updates are low-rank.

2026-02-25 Elementary AI

KV Cache from Scratch: Why LLMs Don't Recompute Everything

Every token generated means recomputing attention over the entire sequence — unless you cache the keys and values. Build the KV cache from scratch, see how it turns O(n²) into O(n), and learn why GQA shrinks the cache by 4-8× in modern LLMs.

2026-02-25 Elementary AI

Positional Encoding from Scratch: How Transformers Know Word Order

Attention is permutation-invariant — "dog bites man" and "man bites dog" look identical. Build sinusoidal encoding, learned embeddings, and Rotary Position Embeddings (RoPE) from pure NumPy, with the mathematical proof of why RoPE conquered modern LLMs.

2026-02-25 Elementary AI

Decoding Strategies from Scratch: How LLMs Choose Their Next Word

Build greedy, random, temperature, top-k, nucleus sampling, and beam search from pure Python. Watch each strategy fail, then discover why nucleus sampling conquered text generation — with an interactive playground where you control every parameter.

2026-02-25 Elementary AI

Optimizers from Scratch: How Neural Networks Actually Learn

Build SGD, Momentum, RMSProp, and Adam from pure math and Python. Watch vanilla gradient descent fail on ravines, then fix it two different ways — and discover why Adam conquered deep learning. Interactive demo lets you race all four optimizers on 2D landscapes.

2026-02-25 Elementary AI

Loss Functions from Scratch: Why Cross-Entropy Rules Deep Learning

Build MSE, binary cross-entropy, and categorical cross-entropy from pure math. Discover why MSE's gradient vanishes when the model is confidently wrong, and how cross-entropy's logarithmic fix gives gradient descent exactly the push it needs.

2026-02-25 Elementary AI

Softmax & Temperature from Scratch: How LLMs Make Choices

Build softmax from pure math, break it with overflow, fix it with the subtract-max trick, then explore how temperature reshapes probability distributions — from Boltzmann's 1868 physics to ChatGPT's creativity slider.

2026-02-25 Elementary AI

Tokenization from Scratch: How LLMs Read Text

Build Byte Pair Encoding from scratch in Python. Start with character-level and word-level tokenizers, watch them fail, then build BPE step by step — with an interactive demo where you watch merges compress text in real time.

2026-02-25 Elementary AI

Micrograd from Scratch

Build a tiny autograd engine in ~100 lines of Python. Teach numbers to remember their history and compute their own derivatives — then train a neural network with it.

2026-02-25 Elementary AI

Attention Is All You Need (To Implement)

Implement the Transformer's core attention mechanism from scratch in NumPy. Build scaled dot-product attention, multi-head attention, and positional encoding — with an interactive heatmap to visualize what attention actually sees.

2026-02-25 Elementary AI

Embeddings from Scratch: How Words Become Vectors

Build Word2Vec's skip-gram from scratch in NumPy. Start with one-hot encoding, discover why it fails, then teach random numbers to understand meaning — with a live interactive demo where you watch words cluster in real time.

2026-02-26 Elementary AI

Weight Initialization from Scratch: The First Decision That Determines If Your Network Learns

Before your network sees a single training example, you've already made the decision that determines whether it learns or dies. Derive Xavier and Kaiming initialization from first principles, race 12 activation-init combinations head-to-head, and explore modern strategies from orthogonal init to GPT-2's residual scaling. Interactive demos let you watch variance explode or collapse through 30 layers and race four initialization strategies in real time.

2026-02-26 Elementary AI

Regularization from Scratch: Every Trick That Stops Your Network from Memorizing the Training Set

Your network hit 99.8% training accuracy and 54% test accuracy — it memorized the answers instead of learning the subject. Build every major regularization technique from scratch: L1/L2 weight penalties, dropout, early stopping, data augmentation, and label smoothing. Interactive demos let you watch overfitting happen in real time and race four regularization strategies head-to-head with weight magnitude heatmaps.

2026-02-26 Elementary AI

Encoder-Decoder from Scratch: The Architecture That Launched Modern NLP

The transformer was born as an encoder-decoder — we just forgot. Build the complete encoder-decoder architecture from LSTM seq2seq through Bahdanau attention to the full transformer, compare it head-to-head against decoder-only, and explore why models like T5, BART, and Whisper still use the original design. Interactive demos let you visualize token flow with cross-attention arrows and race the two architectures side-by-side.

Applied AI

Practical AI for real-world problems

2026-02-27 Applied AI

Model Explainability with SHAP and LIME: Opening the Black Box

Your model says "reject the loan" — but why? Dive into the game theory behind Shapley values, build LIME from scratch with perturbation sampling and weighted regression, and compare both frameworks head-to-head for stability, speed, and production readiness.

2026-02-27 Applied AI

A/B Testing ML Models in Production

Your new model beats the baseline offline, but will it survive production traffic? Build a complete online experimentation pipeline — z-tests, power analysis, O'Brien-Fleming sequential testing, Thompson Sampling bandits, segment-level analysis with Bonferroni correction, and guardrail metrics. Includes interactive demos for sample size exploration and bandit vs. A/B racing.

2026-02-27 Applied AI

Building an AI Search Engine from Scratch

Build a complete search engine from raw text to ranked results — a document ingestion pipeline with SQLite FTS5 and embeddings, hybrid BM25 + vector retrieval with reciprocal rank fusion, cross-encoder reranking, a FastAPI search API, and search quality metrics with P@K, R@K, and NDCG.

2026-02-27 Applied AI

ML Experiment Tracking

It's 11 PM, you've run 47 experiments, and your best model is model_final_v2_FINAL_actually_final.pkl. Build a complete experiment tracking system from scratch — a minimal tracker with human-readable JSON/CSV files, a comparison dashboard, a reproducibility framework, MLflow-style experiments with model registry, and a hyperparameter sweep manager with grid, random, and successive halving search strategies.

2026-02-26 Applied AI

Building AI Code Review Tools

Your senior engineers spend 20-30% of their time reviewing code. Build four AI review tools from scratch — a single-file analyzer, diff-aware PR reviewer, multi-agent pipeline with specialized security/performance/style agents, and a GitHub bot that posts inline comments. Includes counterintuitive findings: personas hurt accuracy, and multi-pass aggregation boosts F1 by 43%.

2026-02-26 Applied AI

Building Multimodal AI Apps: Vision, Documents, and Beyond with Modern VLMs

Two years ago, extracting data from a receipt required OCR, regex, and a prayer. Now it's one API call. Build four multimodal application patterns — document intelligence, chart analysis, batch image processing, and production orchestration — with working Python code, real pricing, and interactive demos.

2026-02-26 Applied AI

LLM Memory Systems: Building AI Applications That Remember

Every LLM starts as an amnesiac — each conversation begins from zero. Build five progressively sophisticated memory systems from scratch (conversation buffer, sliding window, summary memory, entity memory, and semantic long-term memory) that make your AI genuinely smarter the more you interact with it.

2026-02-26 Applied AI

Context Window Strategies: Fitting the World Into Your LLM's Memory

Context windows keep growing, but bigger isn't always better. Build five progressively sophisticated strategies -- smart truncation, chunk-and-summarize, map-reduce, hierarchical summarization, and agentic context management -- with a decision framework for choosing the right one. Includes interactive demos showing how strategies process documents and why naive context stuffing fails.

2026-02-26 Applied AI

Synthetic Data Generation: Using LLMs to Build Your Own Training Datasets

Fine-tuning works — but where do you get 1,000 labeled examples? Build four progressively sophisticated synthetic data pipelines (self-instruct, few-shot amplification, evol-instruct, and quality filtering) that turn a task description into a production-ready training dataset for $10-$50 in API calls.

2026-02-26 Applied AI

Build an LLM Router: Automatically Sending Each Query to the Right Model

Sending every query to GPT-4 is burning money. Build four progressively smarter routing strategies — heuristic, embedding-based, LLM-as-judge, and cascade — that cut API costs 60-80% while preserving quality. Includes production patterns with fallbacks, circuit breakers, and an interactive cost savings calculator.

2026-02-26 Applied AI

Running LLMs on Your Own Machine

Go from zero to a locally-served LLM in three commands. Compare Ollama, llama.cpp, and vLLM head-to-head on throughput and latency, master the memory math that determines what fits on your GPU, and use the interactive "Will It Fit?" calculator to find your setup's sweet spot.

2026-02-26 Applied AI

Retrieval Reranking: Making RAG Actually Good

Your RAG pipeline retrieves 20 candidates but shows the user the wrong 3. The fix is reranking. Build three rerankers from scratch — cross-encoder, LLM-as-judge, and feature-based — benchmark them head-to-head on NDCG@5 and MRR, and watch documents jump positions in the interactive Reranking Arena.

2026-02-26 Applied AI

Systematic Prompt Engineering: From Cargo-Culting to Measurable Results

Stop guessing and start measuring. Five prompt engineering techniques — role framing, few-shot selection, chain-of-thought, output structuring, and negative constraints — each quantified against a 20-case eval set. Build a customer support classifier from 62% to 96% accuracy, one technique at a time, with an interactive Prompt Lab to experiment yourself.

2026-02-26 Applied AI

Multi-Agent Orchestration: Building LLM Systems That Delegate, Verify, and Self-Correct

One agent hits walls — context limits, lost focus, inconsistent results. Build three multi-agent orchestration patterns from scratch: Sequential Pipeline, Parallel Fan-Out, and Debate & Consensus, with a budget tracker, real benchmarks, and an interactive Multi-Agent Playground that lets you watch agents coordinate in real time.

2026-02-26 Applied AI

LLM Function Calling Done Right: From Raw Prompts to Production Tool Use

Function calling is the mechanism that turns a chatbot into an agent. Build a personal finance assistant with OpenAI and Anthropic side by side, master the three API differences that cause cross-provider bugs, and step through parallel calls, sequential chains, and error handling in the interactive Function Calling Playground.

2026-02-26 Applied AI

Guardrails for LLM Applications: Input Validation, Output Filtering, and Prompt Injection Defense

Every tutorial shows how to call an LLM API — almost none show how to stop it from going off the rails. Build a complete three-layer guardrails system: prompt injection detection, PII scanning, output safety filtering, and a production middleware pipeline. Interactive Guardrail Playground lets you try real attacks.

2026-02-26 Applied AI

Streaming LLM Responses: Server-Sent Events, Chunked Transfer, and the UX of Waiting

Every chat interface streams tokens one by one — but how does it actually work? Build a complete streaming pipeline from LLM API to browser, with SSE wire protocol deep dive, FastAPI relay, browser rendering patterns, and an interactive Streaming Playground with real-time latency metrics.

2026-02-25 Applied AI

Fine-Tuning Language Models: A Practical Guide from Dataset to Deployment

You've hit the prompt engineering ceiling — inconsistent JSON, ignored formatting rules, $100/month for a $3 task. Fine-tuning is the escape hatch. A complete walkthrough from JSONL dataset preparation to OpenAI API fine-tuning and open-source LoRA, with a decision framework, synthetic data generation, evaluation suite, and an interactive ROI calculator that shows exactly when fine-tuning pays for itself.

2026-02-25 Applied AI

Evaluating LLM Systems: How to Know If Your AI Actually Works

Most teams ship LLM systems on vibes. Build a proper eval framework from scratch — deterministic assertions, LLM-as-judge with calibrated rubrics, and adversarial tests that break things on purpose. Includes a reusable EvalHarness class and the eval-driven development workflow.

2026-02-25 Applied AI

Batch Processing with LLMs: 10,000 API Calls Without Going Broke

Go from one API call to ten thousand without melting your wallet. Build an async pipeline with rate limiting, retries, caching, cost tracking, and checkpointing — plus an interactive calculator to estimate your real costs.

2026-02-25 Applied AI

Building AI Agents with Tool Use: From Chat to Action

Build a working AI agent in ~80 lines of Python — no frameworks, just a while loop and tool calling. Covers OpenAI and Anthropic APIs, the ReAct pattern, reliability patterns, and an interactive trace visualizer that lets you step through an agent's reasoning.

2026-02-25 Applied AI

Structured Output from LLMs: Getting JSON Every Time

Four progressively more robust approaches to getting valid, schema-compliant JSON from LLMs — from prompt engineering alone to Pydantic + instructor. Working code for OpenAI and Anthropic, with an interactive schema validator demo.

2026-02-25 Applied AI

Building a RAG Pipeline from Scratch

Build a complete Retrieval-Augmented Generation system in ~60 lines of Python. Chunk documents, embed them, retrieve relevant context, and generate grounded answers — no LangChain, no LlamaIndex, just raw code.

2026-02-25 Applied AI

Using LLMs to Parse Grocery Receipts

Snap a photo, get structured data. Build a pipeline that turns receipt images into a SQLite database — with validation, cost analysis, and useful queries.

2026-02-25 Applied AI

Practical Claude Code Patterns for Real Projects

Seven battle-tested patterns from building 21 games and 5 blog posts with Claude Code — the headless loop, hot-swappable prompts, watchdog supervision, and more. Copy-paste recipes, not theory.

Backend & Infra

Databases, performance, and tooling deep dives

2026-02-27 Backend

Distributed Training Benchmarks: Data Parallel, Model Parallel, and Pipeline Parallel Compared

A single A100 has 80GB — GPT-3 needs 350GB. Benchmark DDP, tensor parallelism, and pipeline parallelism head-to-head across 4 model sizes, then use a decision framework to pick the right strategy.

2026-02-27 Backend

ML Model Monitoring: Catching Silent Failures

Your model shipped at 95% accuracy — six months later it's wrong 30% of the time and your dashboard shows green. Build drift detectors (PSI, KS, JS divergence), calibration monitors, feature health checks, and CUSUM alerting from scratch.

2026-02-27 Backend

Load Testing AI APIs: Finding the Breaking Point

Build async load testers from scratch, map throughput S-curves and latency hockey-sticks, and find the exact concurrency where your LLM endpoint breaks — before your users do.

2026-02-27 Backend

GPU Memory Benchmarks: Will This Model Fit?

Profile GPU memory across model sizes, precisions, and training stages. Concrete formulas and interactive calculators to predict whether your model fits before hitting CUDA OOM.

2026-02-26 Backend

Serving LLMs at Scale: From Naive to vLLM

Your single-request LLM server handles 24 req/min — production needs 600+. Walk through static batching, continuous batching, PagedAttention, and speculative decoding with Python simulators, then explore interactive demos that visualize scheduling strategies and GPU memory budgets.

2026-02-26 Backend

LLM Cost Optimization: Cutting Your API Bill by 80% Without Sacrificing Quality

Your LLM bill is growing faster than your user base. Learn the five-layer optimization stack — prompt compression, model routing, tiered caching, batch processing, and token budgets — with real Feb 2026 pricing, Python implementations, and interactive demos that let you calculate savings in real time.

2026-02-26 Backend

LLM Observability in Production: Tracing, Logging, and Monitoring AI Systems

Your APM says everything's green while your LLM hallucinates and your API bill doubles. Build a complete observability stack — from structured logging and cost attribution to quality monitoring and adaptive alerts — with interactive demos that let you tune thresholds in real time.

2026-02-26 Backend

Profiling Python AI Code: Finding the Bottleneck Before You Optimize

Benchmark cProfile, py-spy, and Scalene head-to-head on realistic AI workloads, then use a complete profiling workflow to find a 2x speedup in a RAG pipeline. Includes interactive flame graph explorer and bottleneck detective game.

2026-02-26 Backend

Python Concurrency for AI Workloads: asyncio vs Threading vs Multiprocessing Benchmarked

Benchmark asyncio, threading, and multiprocessing across realistic AI workloads — LLM API calls, parallel tokenization, and hybrid RAG pipelines. Includes interactive demos visualizing GIL contention and serialization crossover points.

2026-02-26 Backend

Hybrid Search Benchmarks: BM25 + Vector Search vs Either Alone

Build BM25, vector, and hybrid search from scratch, then benchmark all three on 200 queries across 5 categories. Discover that hybrid wins by 14 NDCG points — but only because 30-40% of queries genuinely need both signals.

2026-02-25 Backend

Vector Search at Small Scale: pgvector vs FAISS vs Brute Force NumPy

Benchmark three approaches to nearest-neighbor search on 10K–100K vectors. Measure indexing time, query latency, memory, and recall — then discover that brute-force NumPy is surprisingly competitive.

2026-02-25 Backend

SQLite FTS5 vs rapidfuzz: Fuzzy Search Showdown

Head-to-head benchmark on 500K product names. Compare query speed, result quality, batch throughput, and setup complexity — plus a hybrid approach that gets sub-2ms typo-tolerant search even at scale.

2026-02-26 Backend

Caching LLM Responses: Exact Match, Semantic Cache, and Prompt Hashing Benchmarked

Build and benchmark three caching strategies for LLM APIs — exact match, semantic similarity, and structural prompt hashing. Measure hit rates, latency, and cost savings on a 10K query dataset.

2026-02-26 Backend

LLM API Latency Benchmarks: OpenAI vs Anthropic vs Local Models Under Real Load

Rigorous benchmarks of 5 LLM APIs across 5 concurrency levels — measuring TTFT, inter-token latency, throughput, error rates, and true cost. The numbers nobody publishes honestly.

DadOps Chronicles

Building with AI, raising kids, shipping code

2026-02-25 DadOps Chronicles

How Ralph Loop Works

Deep dive into the autonomous coding system that built 21 games and 4 blog posts overnight. Three files, one bash loop, and lessons learned from letting an AI agent run unsupervised.