Andrej Karpathy built AI that trains itself overnight. Eric Siu applied the same loop to marketing. Here's the pattern — and the tool to run it yourself.
▶ Open the Tool Learn the pattern ↓"Most marketing teams run 20-30 experiments a year. Maybe 52 if they're 'good'. New landing page. New ad creative. Maybe a subject line test. That's considered data-driven marketing. But the next generation of marketing systems will run 36,500+ experiments per year."
Eric Siu · Founder, Single Grain · Applying Karpathy's AutoResearch to MarketingOn March 7, 2026, Andrej Karpathy pushed a 630-line Python script to GitHub and went to sleep. By morning his AI agent had run 50 experiments, discovered a better learning rate, and committed the proof — without a single human instruction in between.
The script wasn't doing something exotic. It was automating what every researcher does: modify something, measure the result, keep or discard, repeat. Karpathy just removed the human from the loop.
Within days, Eric Siu made the marketing translation explicit: replace the training script with a marketing asset. Replace validation loss with reply rate. The loop is identical.
AutoResearch isn't magic. It's three constraints — and each constraint is doing specific engineering work:
One thing the agent can modify. Confined search space = interpretable results every time.
One number that determines if the change was better. No committees. No human judgment required.
Fixed duration makes every experiment directly comparable regardless of what changed.
Every component of Karpathy's ML loop has a direct marketing equivalent:
| AutoResearch (ML) | Marketing Loop | Example |
|---|---|---|
| train.py | Your marketing asset | Cold email, landing page, ad creative |
| program.md | Your experiment brief | Audience, goal, constraints, what not to change |
| val_bpb metric | Your success signal | Reply rate, open rate, meeting book rate |
| AI modifies train.py | AI generates variants | 10 subject line variants each with a hypothesis |
| 5-min training run | Deployment window | Send to test segment, measure for 24hrs |
| Keep or discard | Keep or discard | Winner becomes new baseline, loop continues |
| 100 experiments/night | Compounding advantage | Proprietary map of what resonates with YOUR audience |
Human in critical path. Every experiment requires design, copy, deployment, review.
Human sets goal and approves parameters. AI handles hypothesis generation. You review the morning report.
Paste your current subject line, headline, or CTA. Pick your success metric. This is your program.md.
Claude generates N variants, each with an explicit hypothesis about what assumption is being tested.
Run each variant against a segment of your list. Enter the result. Mark KEEP or DISCARD.
Hit Continue and the AI generates the next batch using your winners as context. Patterns compound.
After dozens of iterations you have a ranked experiment log unique to your audience. Competitors can't buy it.
"The companies that win won't have better marketers. They'll have faster experiment loops." — Eric Siu
This isn't about subject lines. The overnight loop is a compounding knowledge machine. Every experiment builds a dataset of what works for your specific audience, offer, and market position. That dataset is yours alone.
Karpathy's insight applies directly: the bottleneck of progress is no longer the ability to run experiments — it's the ability to define the right constraints. The human role shifts from experimenter to experimental designer.
Bring your own Anthropic API key. Stays in your browser — never stored, never shared with anyone.
▶ Open the Tool