How Ralph Loop Works: An Autonomous Coding Agent That Built 21 Games
A Blog Post That Wrote Itself
I need to tell you something about this blog post: it was written by the system it describes. The same autonomous loop that built 21 games on this site is the one that researched the codebase, read its own shell scripts, outlined this article, and typed every word you're reading right now.
That's either very cool or the beginning of a Black Mirror episode. I choose to find it cool.
Here's the backstory. I'm a dad. My free time comes in 15-minute chunks between bedtime stories and diaper changes. I wanted a personal site full of educational games for my kids and technical blog posts for myself, but I didn't have the time to hand-code all of it. So I did what any reasonable developer would do: I built an autonomous coding agent to do it for me.
The result is Ralph Loop — a system so simple it's almost embarrassing. Three files, one bash loop, and a copy of Claude Code. It ran overnight and produced 21 playable games. Then I pointed it at a blog task list and it started writing technical articles. This post explains exactly how it works, with real code, real logs, and real lessons learned.
The Architecture: Three Files and a Dream
The whole system is three files. That's not a simplification — it's literally three files:
That's it. No framework. No orchestration platform. No Kubernetes. Just bash calling claude -p in a while true loop, supervised by a watchdog that restarts it when it crashes or gets stuck.
Why bash over Python or TypeScript? Because the loop itself does almost nothing. It reads a file, passes it to Claude, and checks if a stop file exists. There's no state management, no dependency injection, no retry logic with exponential backoff. The intelligence lives entirely in Claude and the prompt — the loop is just a heartbeat.
The Loop Script: 50 Lines of Autonomy
Let me show you the core of ralph-loop.sh. I've stripped out the cosmetic stuff (colored output, banner art) to show the essential logic:
#!/usr/bin/env bash
set -euo pipefail
PROMPT_FILE="prompt.md"
STOP_FILE="loop_stop.md"
LOOP_LOG="ralph-loop.log"
RAW_LOG="ralph-stream.jsonl"
echo $$ > ralph-loop.pid # so the watchdog can find us
while true; do
# Check for stop file
if [[ -f "$STOP_FILE" ]]; then
echo "Stop file detected. Exiting."
exit 0
fi
# Read prompt fresh each iteration (hot-swappable!)
PROMPT="$(cat "$PROMPT_FILE")"
# Run Claude with streaming JSON for live monitoring
claude -p "$PROMPT" \
--output-format stream-json \
--dangerously-skip-permissions \
2>&1 | parse_stream
# Check for stop file again (Claude may have written one)
if [[ -f "$STOP_FILE" ]]; then
echo "Stop file detected. Exiting."
exit 0
fi
sleep 3
done
A few things worth noticing:
The prompt is re-read every iteration. This is the hot-swap trick. While the loop is running, I can edit prompt.md and the next iteration picks up the new instructions. No restart needed. This saved me multiple times — I could fix a bug in the prompt while Claude was mid-run, and the fix took effect on the very next cycle.
The stop mechanism is a file. Claude itself writes loop_stop.md when it decides all tasks are complete. The loop checks for this file both before and after each Claude invocation. It's the simplest possible graceful shutdown — no signals, no IPC, just a file on disk.
Stream JSON gives you live visibility. The --output-format stream-json flag makes Claude emit newline-delimited JSON events as it works. The parse_stream function reads these in real time and prints a human-friendly view:
parse_stream() {
while IFS= read -r line; do
echo "$line" >> "$RAW_LOG" # save everything
type=$(echo "$line" | jq -r '.type // empty' 2>/dev/null) || continue
case "$type" in
assistant) # Claude is talking
text=$(echo "$line" | jq -r '.message.content[]? |
select(.type == "text") | .text // empty' 2>/dev/null)
[[ -n "$text" ]] && echo "Claude: $text"
;;
tool_use) # Claude is calling a tool
tool=$(echo "$line" | jq -r '.tool_name' 2>/dev/null)
echo "Tool: $tool"
;;
result) # Session finished
cost=$(echo "$line" | jq -r '.cost_usd // "?"' 2>/dev/null)
time=$(echo "$line" | jq -r '.duration_ms // "?"' 2>/dev/null)
echo "Done — Cost: \$$cost | Time: ${time}ms"
;;
esac
done
}
Every raw JSON event also gets appended to ralph-stream.jsonl, which means I have a complete record of every tool call, every piece of text Claude generated, and every result. It's like a flight recorder for autonomous agents.
The Watchdog: Because Agents Get Stuck
Here's a truth that every autonomous agent builder discovers: agents get stuck. They hang on API calls. They enter infinite retry loops. They write code that deadlocks their own process. If you run an autonomous loop without a supervisor, you will wake up to find it frozen at 3 AM, having accomplished nothing for the last 6 hours.
The watchdog provides three guarantees:
- Startup: If no loop is running when the watchdog starts, it launches one inside a named tmux window
- Crash recovery: Every 30 seconds, it checks if the loop's PID is still alive. If not, restart.
- Stuck detection: If the log file hasn't grown in 30 minutes, assume the loop is stuck and kill it
The stuck detection is the interesting part:
# Track log file size to detect stuckness
LAST_LOG_SIZE=0
LAST_LOG_CHANGE="$(date +%s)"
while true; do
sleep 30
CURRENT_LOG_SIZE="$(stat -c%s "$LOOP_LOG" 2>/dev/null || echo 0)"
NOW="$(date +%s)"
if [[ "$CURRENT_LOG_SIZE" -ne "$LAST_LOG_SIZE" ]]; then
# Log is growing — loop is making progress
LAST_LOG_SIZE="$CURRENT_LOG_SIZE"
LAST_LOG_CHANGE="$NOW"
else
# Log hasn't changed — how long has it been?
ELAPSED=$(( NOW - LAST_LOG_CHANGE ))
if [[ "$ELAPSED" -ge 1800 ]]; then
echo "STUCK! No output for ${ELAPSED}s. Killing and restarting."
kill_loop
start_loop_in_tmux
fi
fi
done
It's crude but effective. The assumption: if Claude is working, it's producing output. If 30 minutes pass with zero output, something is wrong. Kill everything, restart fresh.
And it's not theoretical. Here's what the watchdog actually caught in production:
Four stuck events caught across two sessions. Without the watchdog, those would have been hours of wasted time where the loop just sat there, doing nothing, burning neither tokens nor progress.
The Phase System: One Bite at a Time
This is the most important design decision in the whole system: one phase per loop iteration. Never combine. Never skip.
For games, the phases were:
- BUILD — Write the game's HTML/JS from scratch
- VALIDATE — Open it in a browser, test it, fix bugs
- POLISH — Improve visuals, responsiveness, edge cases
- INTEGRATE — Add it to the games page, update the task list
For blog posts, a slightly different set:
- RESEARCH — Study the topic, create a detailed outline
- WRITE — Draft the complete article with code and demos
- POLISH — Re-read, fix errors, verify code correctness
- INTEGRATE — Add the blog card to the main blog page
Why not let Claude do everything in one shot? Three reasons:
Context window limits. Claude's context window is large but not infinite. A game that requires building, testing in a real browser, fixing bugs, and integrating into a page can easily blow past the limit if done in one session. Splitting into phases means each session starts fresh with a focused goal.
Mandatory commits. Every phase ends with git add -A && git commit && git push. This means if anything goes wrong — Claude crashes, writes garbage, or gets stuck — the last good state is always in git. I can roll back to any phase boundary.
Resumability. Each task directory has a STATUS.md file that records the current phase and any notes. If the loop crashes mid-task, the next iteration reads STATUS.md and picks up exactly where it left off. No work is lost, no phase is repeated.
The phase system isn't just for organization — it's a crash recovery mechanism. STATUS.md is your resumable checkpoint. Git commits are your safety net. Together, they make the loop fault-tolerant without any complex state management.
There's also a self-perpetuating rule: after every 4th completed blog post, Claude brainstorms a new topic and adds it to the task list. This means the loop never runs out of work. It's theoretically infinite — a system that generates its own tasks, executes them, and then generates more.
The Prompt: Teaching an Agent to Build a Website
prompt.md is the most important file in the system. It's the agent's entire worldview — the only instructions it receives. Everything Claude does flows from this one document.
The prompt is structured in layers:
- Task discovery: "Read
task_list.md. Find the first unchecked[ ]task. Read itsSTATUS.mdif it exists." - Phase execution: Detailed instructions for each of the four phases — what to produce, what quality bar to hit, what to check.
- Quality standards: A blog post template (exact HTML structure), a 20-item validation checklist, code correctness requirements.
- Self-management: "Commit after every phase. Never skip phases. Never combine them."
- Stop condition: "Write
loop_stop.mdwhen all tasks are done."
The balance between rigidity and freedom is critical. The prompt is very rigid about structure: which phase to execute, when to commit, how to format output. But it's very free about content: how to explain a concept, what analogies to use, how to structure a narrative. This gives Claude guardrails without a straitjacket.
The hot-swappable prompt was a lifesaver in practice. During the games marathon, I noticed Claude was writing games that worked but had tiny text on mobile. I edited one line in prompt.md — "Ensure all text is at least 18px and all touch targets are at least 44px" — and the next game came out perfectly responsive. No restart, no lost progress.
What Ralph Built
The Games Marathon (February 18)
The first run built 21 games in a single overnight session. Thirteen kids' literacy games:
- Word Builder, Typing Fun, Letter Match, Letter Hunt, Phonics Fun
- First Letter, Word Picture, Missing Letter, Rhyme Time, Sight Words
- Word Scramble, Spell It Out, and an Alphabet Upgrade to Word Builder
And seven physics/puzzle games for the grown-ups:
- Balance Stack, Marble Run, Pendulum Wave, Sand Pour
- Spring Mesh, Gear Train (plus a Gear Train bugfix pass)
Every game went through all four phases: build, validate in a real browser (via Playwright), polish, and integrate into the games page. When the last game was integrated, Claude wrote loop_stop.md with the message "All tasks complete" and the loop shut itself down.
The Blog Factory (February 25)
Same system, different prompt, different task list. Four technical blog posts:
- Micrograd from Scratch [elementary] — Autograd engine with interactive computation graph
- Using LLMs to Parse Grocery Receipts [applied] — Vision LLM pipeline with cost analysis
- SQLite FTS5 vs rapidfuzz [backend] — Benchmark showdown with charts and data
- Attention Is All You Need (To Implement) [elementary] — Transformer attention in NumPy with heatmap demo
And then there's this post. The system writing about itself. During the RESEARCH phase, Claude read ralph-loop.sh, watchdog.sh, the watchdog logs, and the JSONL stream data. It used its own source code as research material. I find this delightful.
By the Numbers
From the ralph-stream.jsonl flight recorder:
- 6 sessions logged with full event streams
- 568 total turns (tool calls + responses)
- ~1.8 hours of active compute time for the blog factory
- Longest session: 48 minutes, 303 turns (a multi-post marathon)
- 4 stuck events caught and recovered by the watchdog
One notable gap: the cost_usd field in the stream JSON always comes back as null. This is because --dangerously-skip-permissions mode doesn't populate cost data. I have no idea how much this all cost. Lesson learned: if you care about cost tracking, instrument it separately.
Lessons Learned
What Worked
- The phase system is essential. Forcing one phase per iteration prevents runaway context, creates natural checkpoints, and makes the system resumable. This single design decision is responsible for most of the system's reliability.
- The watchdog is non-negotiable. The first version ran without one. I woke up to a frozen loop that had been stuck for 6 hours. Never again.
- STATUS.md makes everything resumable. Crash in the middle of a task? No problem. Next iteration reads the status file and continues from the last completed phase.
- Hot-swappable prompts are a superpower. Being able to edit instructions while the loop runs lets you steer without stopping. It's the difference between piloting and launching a rocket.
- Git commits are your safety net. Every phase boundary is a commit. If Claude writes garbage in phase 3, you can
git checkoutback to the end of phase 2 and try again.
What Didn't Work
- No cost tracking. The
--dangerously-skip-permissionsmode doesn't report costs. I genuinely don't know what 21 games and 4 blog posts cost. This is a real blind spot. - Initial timeout was too aggressive. The first watchdog timeout was 10 minutes. Some legitimate POLISH phases take longer than that (especially when Claude is reviewing a long article). I bumped it to 30 minutes and the false-positive kills stopped.
- Combining phases led to half-done work. Early prompt versions said "do as many phases as you can." Claude would start phase 2, run out of context, and commit a half-written article. Strict "one phase only" fixed this completely.
- No cost guardrails. There's no spending limit or token budget. If the prompt creates an infinite loop of tasks, it'll run forever. The stop file is manual — someone has to write it or Claude has to decide it's done.
If You Want to Build Your Own
- Start with the simplest possible loop. A
while truecallingclaude -p "$(cat prompt.md)"is enough. Add complexity only when you hit a real problem. - Add a watchdog before you leave it running overnight. You will need it. Agents get stuck.
- Make everything resumable. The STATUS.md pattern — a simple file that records current phase and notes — turns a fragile loop into a crash-tolerant system.
- Commit after every meaningful unit of work. Git is your undo button, your audit trail, and your crash recovery all in one.
- Design the prompt carefully. The prompt IS the agent. Spend more time on prompt.md than on the bash scripts. The scripts are plumbing; the prompt is the brain.
The Meta Moment
Let me zoom out for a second.
This blog post was written by an autonomous loop. That loop was launched by a bash script. That bash script called Claude Code with a prompt. That prompt told Claude to read the bash script, understand it, and write a blog post about it.
Is this recursion? Is this consciousness? No. It's a while true loop, a cat prompt.md, and a very capable language model. The remarkable thing isn't the technology — it's how little technology you need. Three files, zero frameworks, and you have a system that builds games, writes technical articles, and explains itself.
The best part of autonomous coding isn't that it writes code. It's that it frees you to think about what to build next while the machine handles the how. I spent my 15 minutes of free time writing a task list. The loop spent the night building everything on it.
That's a pretty good trade for a dad who's short on time.
References & Further Reading
- Anthropic — Claude Code Documentation — The CLI tool that powers the loop. The
-pflag and--output-format stream-jsonare the key features. - Anthropic — "Claude Code: Best practices for agentic coding" — Official patterns for getting the most out of autonomous coding agents.
- tmux Wiki — Terminal multiplexer that keeps the loop and watchdog running in persistent sessions.
- JSON Lines (JSONL) Format — The newline-delimited JSON format used for stream logging. Simple, appendable, and grep-friendly.