Using LLMs to Parse Grocery Receipts
The Problem: Where Does the Money Go?
I spend roughly $800 a month on groceries for a family of four and have almost no idea where it goes. I know milk is expensive. I think we buy too many snacks. But I couldn't tell you how much we spend on produce versus protein, whether Costco is actually cheaper, or if banana prices have gone up.
The solution everyone suggests is "track your spending." The problem is that tracking means manually entering 30-50 line items per receipt, four times a week. Nobody sticks with that. I tried three apps. The longest I lasted was eleven days.
But now we have vision LLMs that can look at a photo and extract structured data. What if I could just snap a picture of each receipt and let an AI do the data entry?
That's what we're building: a Python script that takes a receipt photo, sends it to a vision LLM, extracts every item and price as structured JSON, validates the numbers, and stores everything in a SQLite database you can query. The whole pipeline is about 120 lines of code, and it costs roughly $0.002 per receipt.
The Pipeline
Each step is a separate function. The image gets resized to save API costs, sent to GPT-4o (or Claude, or Gemini — we'll cover all three), validated to catch hallucinated prices, then stored in a relational database. Let's build each piece.
Setup: Dependencies and Database
Install the packages we need. We'll use OpenAI as the primary provider, but I'll show Anthropic and Google alternatives too:
pip install openai pillow
Now create our SQLite database. We use two tables: receipts for the receipt-level data (store, date, total) and line_items for each individual item. This is a classic one-to-many relationship — one receipt has many line items:
import sqlite3
def init_db(db_path="receipts.db"):
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA foreign_keys = ON")
conn.executescript("""
CREATE TABLE IF NOT EXISTS receipts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
store_name TEXT NOT NULL,
receipt_date TEXT,
subtotal REAL,
tax REAL,
total REAL,
payment_method TEXT,
image_path TEXT,
raw_json TEXT,
parsed_at TEXT DEFAULT (datetime('now')),
validation_ok INTEGER DEFAULT 1
);
CREATE TABLE IF NOT EXISTS line_items (
id INTEGER PRIMARY KEY AUTOINCREMENT,
receipt_id INTEGER NOT NULL
REFERENCES receipts(id) ON DELETE CASCADE,
name TEXT NOT NULL,
quantity REAL DEFAULT 1.0,
unit_price REAL,
line_total REAL,
is_discount INTEGER DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_receipts_date
ON receipts(receipt_date);
CREATE INDEX IF NOT EXISTS idx_items_receipt
ON line_items(receipt_id);
""")
conn.close()
Store the raw_json response from the LLM alongside the structured data. When you improve your prompt (and you will), you'll want to re-parse old receipts without re-uploading the images and burning API credits.
Image Preprocessing
Vision LLM costs scale with image size. A 4000×3000 pixel phone photo costs 2–3x more tokens than a 1500×1000 resize — with no improvement in text extraction accuracy. The sweet spot is around 1500px on the long edge:
from PIL import Image
import base64
import io
def prepare_image(image_path):
"""Resize receipt image and return base64-encoded JPEG."""
with Image.open(image_path) as img:
# Convert RGBA/palette images to RGB for JPEG
if img.mode in ("RGBA", "P"):
img = img.convert("RGB")
# Resize: keep long edge under 1500px
max_dim = 1500
if max(img.size) > max_dim:
ratio = max_dim / max(img.size)
new_size = (int(img.width * ratio), int(img.height * ratio))
img = img.resize(new_size, Image.LANCZOS)
# Encode as JPEG (smaller than PNG, fine for text)
buffer = io.BytesIO()
img.save(buffer, format="JPEG", quality=85)
return base64.standard_b64encode(buffer.getvalue()).decode("utf-8")
JPEG at quality 85 is the right call here. Receipts are high-contrast text on a white background — JPEG compression artifacts are invisible, and the file is 3–5x smaller than PNG.
The Prompt: Structured Extraction
This is the heart of the pipeline. The prompt needs to do three things: tell the model what fields to extract, define the exact JSON structure you want back, and handle edge cases (blurry text, coupons, weight-based items). Here's the system prompt I landed on after iteration:
SYSTEM_PROMPT = """You are a receipt parser. Extract all data from the receipt image
and return it as a single JSON object matching this exact structure:
{
"store_name": "string",
"date": "YYYY-MM-DD or null",
"items": [
{
"name": "string",
"quantity": number,
"unit_price": number,
"line_total": number,
"is_discount": false
}
],
"subtotal": number,
"tax": number,
"total": number,
"payment_method": "cash|credit|debit|unknown"
}
Rules:
- Return ONLY valid JSON. No markdown, no explanation.
- Use null for any field you cannot read. Do not guess or invent data.
- For weight-based items (e.g., "0.73 lb @ $2.99/lb"), use the per-unit
price as unit_price and the extended price as line_total.
- For discounts and coupons, create an item entry with is_discount: true
and a NEGATIVE line_total.
- Dates must be YYYY-MM-DD format. Convert from any format on the receipt.
- Item names should be the full text as printed (preserve abbreviations).
- If the receipt shows multiple tax lines, sum them into a single tax value."""
A few lessons learned from prompt iteration:
- "Return ONLY valid JSON" — without this, models love to add "Here's the extracted data:" before the JSON
- "Use null, do not guess" — LLMs will hallucinate a plausible store address rather than admit they can't read it
- Explicit date format — American receipts print dates as "02/25/26" and the model needs to know you want "2026-02-25"
- Discount handling — coupons as negative line items makes the math work cleanly in validation
Now the API call. We set temperature=0 for deterministic extraction and use OpenAI's json_object response format to guarantee valid JSON:
import json
from openai import OpenAI
client = OpenAI() # reads OPENAI_API_KEY from environment
def extract_receipt(image_b64):
"""Send receipt image to GPT-4o and return parsed JSON."""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
max_tokens=4000,
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_b64}",
"detail": "high"
}
},
{"type": "text", "text": "Parse this receipt."}
]}
]
)
return json.loads(response.choices[0].message.content)
Alternative: Anthropic Claude
Claude Haiku is cheaper per receipt (~$0.0016) and nearly as accurate. The API structure differs slightly — note that Claude wants the image before the text in the content array:
import anthropic
def extract_receipt_claude(image_b64):
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=4000,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_b64
}},
{"type": "text", "text": SYSTEM_PROMPT + "\n\nParse this receipt."}
]
}]
)
return json.loads(response.content[0].text)
A counter-intuitive cost trap: GPT-4o-mini is actually more expensive than GPT-4o for vision tasks. Mini charges a fixed 2,833 tokens per image versus GPT-4o's ~765 tokens. Stick with full GPT-4o or use Claude Haiku / Gemini Flash for the budget option.
Validation: Trust but Verify
LLMs are impressively good at reading receipts — around 97% accurate on clean, well-lit photos. But that remaining 3% can be subtle: a $3.99 read as $9.99, a missing line item, a tax amount confused with the total. We need arithmetic validation.
The key insight: a receipt is self-checking. The line items should sum to the subtotal, and the subtotal plus tax should equal the total. If either check fails, something went wrong.
from decimal import Decimal
def validate_receipt(data):
"""Check that receipt numbers add up. Returns (is_valid, errors)."""
errors = []
# Check required fields
for field in ("store_name", "total", "items"):
if not data.get(field):
errors.append(f"Missing required field: {field}")
items = data.get("items", [])
if not items:
return len(errors) == 0, errors
# Do line items sum to subtotal?
if data.get("subtotal"):
item_sum = sum(
Decimal(str(item["line_total"]))
for item in items
)
subtotal = Decimal(str(data["subtotal"]))
if abs(item_sum - subtotal) > Decimal("0.10"):
errors.append(
f"Item sum ${item_sum} != subtotal ${subtotal}"
)
# Does subtotal + tax = total?
if all(data.get(k) is not None for k in ("subtotal", "tax", "total")):
calculated = (
Decimal(str(data["subtotal"]))
+ Decimal(str(data["tax"]))
)
stated = Decimal(str(data["total"]))
if abs(calculated - stated) > Decimal("0.10"):
errors.append(
f"subtotal + tax = ${calculated}, but total = ${stated}"
)
# Sanity check
total = data.get("total", 0)
if total and (total < 0.01 or total > 2000):
errors.append(f"Total ${total} outside plausible range")
return len(errors) == 0, errors
Why Decimal instead of float? Because 3 * 1.33 in Python floats gives 3.9900000000000002, not 3.99. For monetary arithmetic, always use Decimal. The $0.10 tolerance accounts for legitimate rounding differences — stores sometimes round each item individually rather than the sum.
Storing Results
Now we tie it all together. The parse_receipt function runs the full pipeline — preprocess, extract, validate, store — in one call:
def store_receipt(data, image_path, raw_json, is_valid, db_path="receipts.db"):
"""Insert parsed receipt and line items into SQLite."""
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA foreign_keys = ON")
try:
cursor = conn.execute("""
INSERT INTO receipts
(store_name, receipt_date, subtotal, tax, total,
payment_method, image_path, raw_json, validation_ok)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
data.get("store_name", "Unknown"),
data.get("date"),
data.get("subtotal"),
data.get("tax"),
data.get("total"),
data.get("payment_method"),
image_path,
raw_json,
1 if is_valid else 0
))
receipt_id = cursor.lastrowid
for item in data.get("items", []):
conn.execute("""
INSERT INTO line_items
(receipt_id, name, quantity, unit_price,
line_total, is_discount)
VALUES (?, ?, ?, ?, ?, ?)
""", (
receipt_id,
item["name"],
item.get("quantity", 1.0),
item.get("unit_price"),
item.get("line_total"),
1 if item.get("is_discount") else 0
))
conn.commit()
return receipt_id
except Exception:
conn.rollback()
raise
finally:
conn.close()
def parse_receipt(image_path, db_path="receipts.db"):
"""Full pipeline: image → LLM → validate → store."""
# 1. Preprocess
image_b64 = prepare_image(image_path)
# 2. Extract
data = extract_receipt(image_b64)
raw_json = json.dumps(data, indent=2)
# 3. Validate
is_valid, errors = validate_receipt(data)
if errors:
print(f"Validation warnings: {errors}")
# 4. Store
receipt_id = store_receipt(data, image_path, raw_json, is_valid, db_path)
print(f"Receipt #{receipt_id}: {data.get('store_name')} "
f"${data.get('total')} ({'OK' if is_valid else 'NEEDS REVIEW'})")
return receipt_id
Usage is one line:
init_db()
parse_receipt("receipt_photo.jpg")
Querying Your Data
This is where the investment pays off. Once you've parsed a few weeks of receipts, you can ask real questions about your spending. Here are the queries I find most useful:
-- Monthly spending totals
SELECT
strftime('%Y-%m', receipt_date) AS month,
COUNT(*) AS trips,
ROUND(SUM(total), 2) AS spent
FROM receipts
WHERE receipt_date IS NOT NULL
GROUP BY month
ORDER BY month DESC;
-- Most frequently purchased items
SELECT
name,
COUNT(*) AS times_bought,
ROUND(AVG(unit_price), 2) AS avg_price,
ROUND(MIN(unit_price), 2) AS cheapest,
ROUND(MAX(unit_price), 2) AS priciest
FROM line_items
WHERE is_discount = 0
GROUP BY name
HAVING times_bought > 1
ORDER BY times_bought DESC
LIMIT 15;
-- Track price of a specific item over time
SELECT
r.receipt_date,
r.store_name,
li.unit_price
FROM line_items li
JOIN receipts r ON li.receipt_id = r.id
WHERE li.name LIKE '%MILK%'
ORDER BY r.receipt_date;
-- Which store is cheapest for an item?
SELECT
r.store_name,
ROUND(AVG(li.unit_price), 2) AS avg_price,
COUNT(*) AS samples
FROM line_items li
JOIN receipts r ON li.receipt_id = r.id
WHERE li.name LIKE '%EGGS%'
GROUP BY r.store_name
ORDER BY avg_price;
These queries get more interesting as your dataset grows. After a few months, you can spot seasonal price patterns, measure whether switching stores actually saved money, and finally answer the question of where all that money goes.
What Does This Actually Cost?
Let's do the math. A typical receipt photo resized to ~1500px is about 900 tokens for the image plus ~500 tokens of JSON output. Here's what that costs across providers:
| Provider | Model | Cost / Receipt | 208 Receipts / Year |
|---|---|---|---|
| Gemini Flash | ~$0.0002 | $0.04 | |
| Anthropic | Claude Haiku | ~$0.0016 | $0.33 |
| OpenAI | GPT-4o | ~$0.002 | $0.42 |
| OpenAI | GPT-4o-mini | ~$0.004 | $0.83 |
At 4 grocery trips per week, 52 weeks per year, that's 208 receipts. With GPT-4o, the total annual cost is 42 cents. Gemini Flash is essentially free. Even the most expensive option costs less than a single banana.
The GPT-4o-mini row is there as a warning: despite being "mini," it's actually 2x more expensive than full GPT-4o for vision tasks due to its fixed per-image token charge. Don't use it for image processing.
Accuracy and Limitations
In practice, I've found vision LLMs are remarkably good at receipt parsing. Clean, flat, well-lit photos get parsed correctly about 97% of the time at the field level. But a few things can trip them up:
- Crumpled or folded receipts — accuracy drops to ~85-90%. Iron or flatten first.
- Thermal fading — old receipts where the text is disappearing. Below 70% accuracy.
- Weight-based items — the multi-line format ("0.73 lb @ $2.99/lb") sometimes gets merged incorrectly.
- Coupon stacking — receipts with many discounts and BOGO offers confuse the item-discount pairing.
The validation step catches most errors. When validation fails, the receipt still gets stored (with validation_ok=0) so you can review it later. In practice, I manually fix maybe 1 in 20 receipts — a much better ratio than entering 40 items by hand.
What's Next
We built a complete pipeline from receipt photo to queryable database in about 120 lines of Python. The per-receipt cost is negligible, the accuracy is solid, and the setup takes five minutes.
Where you could take this next:
- Automatic categorization — ask the LLM to assign categories (produce, dairy, meat, snacks) to each item. Add a
categorycolumn and unlock spending-by-category queries. - Batch processing — point it at a folder of receipt photos and parse them all.
- Price alerts — query for items whose price jumped more than 20% since last purchase.
- Weekly email digest — a cron job that summarizes the week's spending and highlights trends.
The broader lesson is that vision LLMs have turned "extract structured data from any document" into a solved problem for most practical purposes. Receipts are just the beginning — the same pattern works for invoices, medical bills, report cards, or any other structured document you'd rather not type by hand.
References & Further Reading
- OpenAI — Vision API Documentation — Official guide to sending images to GPT-4o, including detail levels and token counting.
- Anthropic — Vision with Claude — Claude's image handling, token calculation formula, and best practices.
- Google — Gemini Vision API — Using Gemini for image understanding, including structured output mode.
- Instructor Library — Receipt Extraction Example — Using Pydantic models with LLMs for guaranteed structured output with automatic retries.
- Strathweb — GPT-4o-mini Vision Cost Analysis — The counter-intuitive pricing that makes mini more expensive for images.