AutoKarp

Run autoresearch in the Cloud

Karpathy's autonomous ML research loop, hosted on managed GPUs. Edit program.md, hit start, check results in the morning.

How it works

1

Edit program.md

Describe your research goals in plain English. The program file tells the agent what to explore and how to evaluate results.

2

Start a run

We provision an RTX 4090 on RunPod, fetch the upstream autoresearch code at a pinned commit, and begin. The LLM agent iterates autonomously — no babysitting required.

3

Check results

Live val_bpb chart, results.tsv, and full diffs as the agent works. Download the complete workspace when done — it's replayable locally with uv run train.py.

Under the hood

AutoKarp runs Karpathy's autoresearch project on managed GPUs. Here's exactly what happens when you hit start.

Pinned upstream

We vendor autoresearch at commit 228791fb. train.py, prepare.py, pyproject.toml, and uv.lock come from that exact SHA — no forks, no patches. The agent loop approximates the upstream workflow using the same constraints and evaluation.

Single GPU class

Every run uses an NVIDIA RTX 4090 on RunPod. Results are hardware-specific — val_bpb may differ if you replay on a different GPU locally.

Read-only files

prepare.py, pyproject.toml, and uv.lock are locked. The LLM agent only modifies train.py. You control program.md — the instructions that guide the agent.

Fully replayable downloads

Completed runs produce a tarball with the full .git/ history, program.md, train.py, results.tsv, and all logs. cd in and run uv run train.py to reproduce locally.

Simple pricing

Explorer

$39one-time

5 runs

  • 5 runs to try AutoKarp
  • Up to 25 iterations per run
  • 2 concurrent runs

Researcher

$99/mo

15 runs/mo

  • 15 runs/month
  • Up to 50 iterations per run
  • 3 concurrent runs
Popular

Lab

$249/mo

45 runs/mo

  • 45 runs/month
  • Up to 50 iterations per run
  • 5 concurrent runs