Spectral Clustering from Scratch: Using Eigenvalues to Find Hidden Structure
1. When K-Means Fails — The Shape Problem
Two concentric rings of data points. Your eyes instantly see two clusters. K-means sees one confused mess, slicing a straight line through both rings and declaring victory. The problem isn't that k-means is broken — it's that k-means assumes clusters are convex blobs. It assigns each point to the nearest centroid, which means every cluster boundary is a hyperplane. Rings, crescents, spirals? Not a chance.
This is where spectral clustering enters the picture. Instead of measuring distance to a center, it asks a fundamentally different question: which points are connected to each other? The algorithm builds a graph from your data, then uses linear algebra — specifically, the eigenvalues and eigenvectors of the graph's Laplacian matrix — to discover clusters that k-means could never find.
The recipe sounds almost too elegant: build a similarity graph, compute its Laplacian, extract a few eigenvectors, and run k-means in this new eigenspace. But why does this work? That's the real story. Let's build it from scratch.
2. The Similarity Graph — From Points to Connections
Step one of spectral clustering: forget about coordinates. Instead, build a graph where each data point is a node, and edges encode how similar two points are.
The most common approach uses the Gaussian (RBF) kernel. For two points xᵢ and xⱼ, their similarity is:
wᵢⱼ = exp(−‖xᵢ − xⱼ‖² / 2σ²)
The bandwidth parameter σ controls the scale of "local." Small σ means only very close points are connected; large σ connects everything. Getting this right matters — we'll see it in the demos shortly.
There are three standard ways to build the graph:
- Fully connected: Every pair of points gets an edge weighted by the RBF kernel. Dense but retains all information.
- k-nearest neighbor: Each point connects only to its k closest neighbors. Symmetrize by keeping an edge if either endpoint lists the other as a neighbor. Sparse and efficient.
- ε-neighborhood: Connect points within distance ε. Simple but sensitive to the choice of ε.
Here's how to build all three from scratch:
import numpy as np
from sklearn.datasets import make_moons
X, y_true = make_moons(n_samples=200, noise=0.06, random_state=42)
def rbf_similarity(X, sigma=0.3):
"""Fully connected similarity matrix with RBF kernel."""
n = X.shape[0]
dists_sq = np.sum((X[:, None] - X[None, :]) ** 2, axis=2)
return np.exp(-dists_sq / (2 * sigma ** 2))
def knn_graph(X, k=10, sigma=0.3):
"""k-nearest neighbor graph (symmetrized)."""
W_full = rbf_similarity(X, sigma)
n = W_full.shape[0]
W_knn = np.zeros_like(W_full)
for i in range(n):
neighbors = np.argsort(W_full[i])[-k-1:-1] # top-k excluding self
W_knn[i, neighbors] = W_full[i, neighbors]
return np.maximum(W_knn, W_knn.T) # symmetrize
def epsilon_graph(X, epsilon=0.5, sigma=0.3):
"""Epsilon-neighborhood graph."""
dists_sq = np.sum((X[:, None] - X[None, :]) ** 2, axis=2)
mask = dists_sq < epsilon ** 2
np.fill_diagonal(mask, False)
W_full = rbf_similarity(X, sigma)
return W_full * mask
# Build all three
W_full = rbf_similarity(X, sigma=0.3)
W_knn = knn_graph(X, k=10, sigma=0.3)
W_eps = epsilon_graph(X, epsilon=0.4, sigma=0.3)
print(f"Fully connected: {np.count_nonzero(W_full):.0f} nonzero entries")
print(f"KNN (k=10): {np.count_nonzero(W_knn):.0f} nonzero entries")
print(f"Epsilon (e=0.4): {np.count_nonzero(W_eps):.0f} nonzero entries")
The fully connected graph is dense (every pair linked), while the kNN and ε-neighborhood versions are sparse. For spectral clustering, the fully connected RBF graph is the most common default — it's what we'll use in the demos. The key insight: points within the same crescent get high weights; points across crescents get near-zero weights. The graph captures the shape that raw Euclidean distance misses.
3. The Graph Laplacian — Where Algebra Meets Graph Theory
Now we have a weighted graph. The next question: how do we extract cluster structure from it using linear algebra?
Start with two matrices. The similarity matrix W we already have. The degree matrix D is diagonal, where each entry dᵢᵢ is the sum of row i in W — the total connection strength of node i:
D = diag(d₁, d₂, ..., dₙ) where dᵢ = Σⱼ wᵢⱼ
The unnormalized graph Laplacian is simply:
L = D − W
This matrix has remarkable properties:
- It's symmetric and positive semi-definite (all eigenvalues ≥ 0)
- Its smallest eigenvalue is always 0, with eigenvector 1 (the all-ones vector)
- The number of zero eigenvalues equals the number of connected components — this is the fundamental insight
That last property is the magic. If your graph has k perfectly separated clusters (no edges between them), then L has exactly k eigenvalues equal to zero. The corresponding eigenvectors are indicator vectors for each cluster. Even when clusters aren't perfectly separated — when there are a few weak cross-cluster edges — the first k eigenvalues are near zero, and their eigenvectors still encode the cluster structure.
There are also normalized variants that handle clusters of different sizes better:
- Symmetric normalized: Lsym = D−1/2 L D−1/2 = I − D−1/2 W D−1/2
- Random walk: Lrw = D−1 L = I − D−1 W
The random walk Laplacian has a nice interpretation: D−1W is the transition matrix of a random walk on the graph. A random walker tends to stay within a cluster (high internal connectivity) and rarely jumps to another (weak cross-cluster edges). The eigenvectors of Lrw capture exactly these "trapping regions."
def compute_laplacian(W, normalized=False):
"""Compute graph Laplacian from similarity matrix W."""
D = np.diag(W.sum(axis=1))
L = D - W
if not normalized:
return L
# Symmetric normalized Laplacian: D^(-1/2) L D^(-1/2)
d_inv_sqrt = np.diag(1.0 / np.sqrt(W.sum(axis=1) + 1e-10))
L_sym = d_inv_sqrt @ L @ d_inv_sqrt
return L_sym
# Build Laplacian and examine its spectrum
W = rbf_similarity(X, sigma=0.3)
L = compute_laplacian(W, normalized=False)
eigenvalues, eigenvectors = np.linalg.eigh(L)
print("First 10 eigenvalues:")
print(np.round(eigenvalues[:10], 6))
# Output: [0.0, 0.000003, 2.841, 3.156, 3.982, ...]
# ^--- two near-zero = two clusters!
# ^--- eigengap here!
print(f"\nEigengap (lambda_3 - lambda_2): {eigenvalues[2] - eigenvalues[1]:.4f}")
print(f"This suggests k=2 clusters")
The spectrum tells the story: two eigenvalues near zero (two clusters in the moons data), then a sharp jump to eigenvalue three. That jump — the eigengap — is how spectral clustering reveals the number of clusters automatically. We'll dig deeper into this in Section 6.
4. The Spectral Clustering Algorithm
With the Laplacian and its eigendecomposition in hand, the full algorithm is surprisingly short:
- Build the similarity matrix W using the RBF kernel with bandwidth σ
- Compute the graph Laplacian L (or its normalized variant)
- Find the k smallest eigenvectors of L → stack them as columns into a matrix U ∈ ℝn×k
- Treat each row of U as a new k-dimensional representation of the corresponding data point
- Run k-means on the rows of U
Why does this work? The eigenvectors of L corresponding to near-zero eigenvalues are approximately piecewise-constant on clusters. Points in the same cluster get similar eigenvector values; points in different clusters get different values. When you plot the rows of U, the clusters that were intertwined in the original space become well-separated blobs — exactly the kind of structure k-means handles perfectly.
The Ng-Jordan-Weiss (NJW) variant adds one extra step: normalize each row of U to unit length before running k-means. This projects points onto a hypersphere, making the clusters even more spherical and k-means even more reliable.
from sklearn.cluster import KMeans
def spectral_clustering(X, k=2, sigma=0.3, normalized=True):
"""Full spectral clustering from scratch."""
# Step 1: Similarity graph
W = rbf_similarity(X, sigma)
# Step 2: Graph Laplacian
L = compute_laplacian(W, normalized=normalized)
# Step 3: Eigendecomposition — k smallest eigenvectors
eigenvalues, eigenvectors = np.linalg.eigh(L)
U = eigenvectors[:, :k] # first k columns (smallest eigenvalues)
# Step 4 (NJW): Normalize rows to unit length
row_norms = np.linalg.norm(U, axis=1, keepdims=True) + 1e-10
U_normalized = U / row_norms
# Step 5: K-means in eigenspace
labels = KMeans(n_clusters=k, n_init=10, random_state=42).fit_predict(U_normalized)
return labels, eigenvalues, U_normalized
# Run on two moons
labels, evals, embedding = spectral_clustering(X, k=2, sigma=0.3)
accuracy = max(np.mean(labels == y_true), np.mean(labels != y_true))
print(f"Spectral clustering accuracy: {accuracy:.1%}")
# Output: Spectral clustering accuracy: 100.0%
# Compare with k-means on raw data
km_labels = KMeans(n_clusters=2, n_init=10, random_state=42).fit_predict(X)
km_acc = max(np.mean(km_labels == y_true), np.mean(km_labels != y_true))
print(f"K-means accuracy: {km_acc:.1%}")
# Output: K-means accuracy: 73.0%
100% versus 73%. On the two-moons dataset, spectral clustering perfectly separates the crescents while k-means flails. Let's see why by looking at the embedding:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Original data with true labels
axes[0].scatter(X[:, 0], X[:, 1], c=y_true, cmap='coolwarm', s=15)
axes[0].set_title("Original Data")
# Eigenvector embedding
axes[1].scatter(embedding[:, 0], embedding[:, 1], c=y_true, cmap='coolwarm', s=15)
axes[1].set_title("Eigenvector Embedding")
axes[1].set_xlabel("Eigenvector 1")
axes[1].set_ylabel("Eigenvector 2")
# Spectral clustering result
axes[2].scatter(X[:, 0], X[:, 1], c=labels, cmap='coolwarm', s=15)
axes[2].set_title("Spectral Clustering Result")
plt.tight_layout()
plt.savefig("spectral_pipeline.png", dpi=150)
plt.show()
In the eigenvector embedding, the two interleaving crescents become two cleanly separated point clouds. K-means in this transformed space trivially finds the right partition. The Laplacian's eigenvectors have done the heavy lifting — they've unfolded the shape-based structure into a linearly separable representation.
5. Graph Cuts — The Optimization Perspective
There's a beautiful optimization story behind why spectral clustering works. Clustering a graph is really a graph partitioning problem: find groups that minimize the total weight of edges between groups.
The simplest objective is min-cut: minimize cut(A, B) = Σi∈A, j∈B wᵢⱼ. But min-cut has a trivial (and useless) solution: just isolate a single outlier node. To get balanced partitions, we need to normalize:
- RatioCut: cut(A, B) / |A| + cut(A, B) / |B| — balances by cluster size (number of points)
- Normalized cut (Ncut): cut(A, B) / vol(A) + cut(A, B) / vol(B) — balances by volume (total edge weight within each cluster)
Both objectives are NP-hard to optimize exactly. The spectral approach is a continuous relaxation: instead of requiring cluster indicator vectors with discrete {0, 1} entries, we allow them to take continuous values. This relaxation turns the combinatorial problem into an eigenvalue problem:
- Minimizing RatioCut → second eigenvector of L (the unnormalized Laplacian)
- Minimizing Ncut → second eigenvector of Lrw (the random walk Laplacian)
The second smallest eigenvector of L is called the Fiedler vector. For a two-way partition, you can simply threshold it: points where the Fiedler vector is positive go in one cluster, negative in the other. This gives the best balanced cut of the graph that linear algebra can find.
def graph_cut_values(W, labels):
"""Compute RatioCut and Ncut for a given partition."""
unique_labels = np.unique(labels)
n = len(labels)
# Cut value: total weight of edges crossing clusters
cut = 0.0
for i in range(n):
for j in range(i + 1, n):
if labels[i] != labels[j]:
cut += W[i, j]
# RatioCut: cut/|A| + cut/|B|
ratiocut = sum(cut / np.sum(labels == lbl) for lbl in unique_labels)
# Ncut: cut/vol(A) + cut/vol(B)
volumes = [W[labels == lbl].sum() for lbl in unique_labels]
ncut = sum(cut / vol for vol in volumes if vol > 0)
return cut, ratiocut, ncut
W = rbf_similarity(X, sigma=0.3)
# Spectral clustering partition
cut_s, rc_s, nc_s = graph_cut_values(W, labels)
# Random partition for comparison
rng = np.random.RandomState(0)
random_labels = rng.randint(0, 2, size=len(X))
cut_r, rc_r, nc_r = graph_cut_values(W, random_labels)
print(f"{'Metric':<12} {'Spectral':>10} {'Random':>10}")
print(f"{'Cut':<12} {cut_s:>10.2f} {cut_r:>10.2f}")
print(f"{'RatioCut':<12} {rc_s:>10.4f} {rc_r:>10.4f}")
print(f"{'Ncut':<12} {nc_s:>10.4f} {nc_r:>10.4f}")
Spectral clustering's partition achieves dramatically lower cut values than a random partition — it's finding the natural seams in the graph where few edges cross.
6. Choosing k — The Eigengap Heuristic
One of spectral clustering's nicest features: the eigenvalue spectrum tells you how many clusters exist. The eigengap heuristic says: choose k to maximize the gap λk+1 − λk.
The intuition goes back to connected components. If the data had k perfectly isolated clusters (zero edges between them), the Laplacian would have exactly k zero eigenvalues. With real data, clusters are approximately separated — the first k eigenvalues are small, and then there's a jump. That jump is the eigengap.
from sklearn.datasets import make_circles
datasets = {
"2 clusters (moons)": make_moons(200, noise=0.06, random_state=42)[0],
"3 clusters (blobs)": np.vstack([
np.random.RandomState(42).randn(70, 2) * 0.4 + center
for center in [(-2, 0), (2, 0), (0, 3)]
]),
"2 clusters (circles)": make_circles(200, noise=0.05, factor=0.4,
random_state=42)[0],
}
for name, data in datasets.items():
W = rbf_similarity(data, sigma=0.5)
L = compute_laplacian(W, normalized=False)
evals = np.linalg.eigvalsh(L)
# Find eigengap
gaps = np.diff(evals[:10])
k_est = np.argmax(gaps) + 1
print(f"{name}")
print(f" First 8 eigenvalues: {np.round(evals[:8], 4)}")
print(f" Largest gap at k={k_est} "
f"(gap={gaps[k_est-1]:.4f})")
print()
When it works well: clearly separated clusters produce a dramatic eigengap. When it struggles: gradually merging clusters produce a smooth spectrum with no obvious gap. In practice, plotting the eigenvalue sequence and looking for the "elbow" is more robust than blindly taking the maximum gap.
Try It: Spectral Clustering Pipeline
Try It: K-Means vs Spectral Clustering
K-Means
Spectral Clustering
7. Conclusion
Spectral clustering is one of those algorithms that feels like a magic trick until you understand the pieces. Build a graph from your data. Compute the Laplacian. Extract its eigenvectors. Run k-means in eigenspace. Each step is simple on its own, but together they let you discover clusters that distance-based methods can't touch.
The deep insight is that the Laplacian's eigenvectors are a change of basis that makes hidden structure visible. Non-convex shapes become convex. Intertwined clusters separate. The spectrum even tells you how many clusters exist, via the eigengap. It's linear algebra doing what it does best: revealing structure through decomposition.
Of course, spectral clustering isn't free. It requires O(n³) time for the eigendecomposition (though sparse approximations help), and the bandwidth parameter σ needs tuning. For large datasets, approximate methods like the Nyström approximation or spectral clustering with landmark points make it tractable. But for moderate-sized problems where cluster shapes matter, it remains one of the most elegant tools in the ML toolkit.
References & Further Reading
- Ulrike von Luxburg — A Tutorial on Spectral Clustering (2007) — The definitive tutorial covering theory, algorithms, and practical advice
- Jianbo Shi & Jitendra Malik — Normalized Cuts and Image Segmentation (2000) — Introduced Ncut for image segmentation, one of spectral clustering's most influential applications
- Andrew Ng, Michael Jordan & Yair Weiss — On Spectral Clustering (2001) — The NJW algorithm with row-normalization that we implemented above
- Miroslav Fiedler — Algebraic Connectivity of Graphs (1973) — Introduced the Fiedler vector and algebraic connectivity
- Fan Chung — Spectral Graph Theory (1997) — Comprehensive mathematical treatment of graph spectra