The Overnight Loop — Marketing AutoResearch

"Most marketing teams run 20-30 experiments a year. Maybe 52 if they're 'good'. New landing page. New ad creative. Maybe a subject line test. That's considered data-driven marketing. But the next generation of marketing systems will run 36,500+ experiments per year."

Eric Siu · Founder, Single Grain · Applying Karpathy's AutoResearch to Marketing

Where This Came From

On March 7, 2026, Andrej Karpathy pushed a 630-line Python script to GitHub and went to sleep. By morning his AI agent had run 50 experiments, discovered a better learning rate, and committed the proof — without a single human instruction in between.

The script wasn't doing something exotic. It was automating what every researcher does: modify something, measure the result, keep or discard, repeat. Karpathy just removed the human from the loop.

Within days, Eric Siu made the marketing translation explicit: replace the training script with a marketing asset. Replace validation loss with reply rate. The loop is identical.

The Three Primitives

AutoResearch isn't magic. It's three constraints — and each constraint is doing specific engineering work:

Editable Asset

One thing the agent can modify. Confined search space = interpretable results every time.

ML: train.py — model architecture + hyperparams

MKTG: subject line, headline, CTA, email body

Scalar Metric

One number that determines if the change was better. No committees. No human judgment required.

ML: val_bpb — validation bits per byte

MKTG: open rate, reply rate, conversion rate

Time-Boxed Cycle

Fixed duration makes every experiment directly comparable regardless of what changed.

ML: exactly 5 minutes of training per run

MKTG: 24-48hr deployment window per variant

The Translation

Every component of Karpathy's ML loop has a direct marketing equivalent:

AutoResearch (ML)	Marketing Loop	Example
train.py	Your marketing asset	Cold email, landing page, ad creative
program.md	Your experiment brief	Audience, goal, constraints, what not to change
val_bpb metric	Your success signal	Reply rate, open rate, meeting book rate
AI modifies train.py	AI generates variants	10 subject line variants each with a hypothesis
5-min training run	Deployment window	Send to test segment, measure for 24hrs
Keep or discard	Keep or discard	Winner becomes new baseline, loop continues
100 experiments/night	Compounding advantage	Proprietary map of what resonates with YOUR audience

Before vs After

Traditional Marketing Team

EXPERIMENTS / YEAR

2wk

AVERAGE TEST CYCLE

VARIABLE TESTED AT A TIME

Human in critical path. Every experiment requires design, copy, deployment, review.

AutoResearch Marketing Loop

36,500+

EXPERIMENTS / YEAR

24hr

MINIMUM TEST CYCLE

10x

VARIANTS PER SESSION

Human sets goal and approves parameters. AI handles hypothesis generation. You review the morning report.

How To Run It

Define Your Asset + Metric

Paste your current subject line, headline, or CTA. Pick your success metric. This is your program.md.

↓ LOOP BEGINS

AI Generates Hypotheses + Variants

Claude generates N variants, each with an explicit hypothesis about what assumption is being tested.

↓ YOU DEPLOY AND MEASURE

Enter Your Results

Run each variant against a segment of your list. Enter the result. Mark KEEP or DISCARD.

↓ LOOP LEARNS

Continue — AI Learns From Winners

Hit Continue and the AI generates the next batch using your winners as context. Patterns compound.

↓ COMPOUNDING ADVANTAGE BUILDS

You Build a Proprietary Map

After dozens of iterations you have a ranked experiment log unique to your audience. Competitors can't buy it.

The Real Moat

"The companies that win won't have better marketers. They'll have faster experiment loops." — Eric Siu

This isn't about subject lines. The overnight loop is a compounding knowledge machine. Every experiment builds a dataset of what works for your specific audience, offer, and market position. That dataset is yours alone.

Karpathy's insight applies directly: the bottleneck of progress is no longer the ability to run experiments — it's the ability to define the right constraints. The human role shifts from experimenter to experimental designer.

Run 36,500
Experiments
This Year.

Where This Came From

The Three Primitives

The Translation

Before vs After

How To Run It

The Real Moat

Run Your First Loop

Run 36,500ExperimentsThis Year.

Where This Came From

The Three Primitives

The Translation

Before vs After

How To Run It

The Real Moat

Run Your First Loop

Run 36,500
Experiments
This Year.