Karpathy's autonomous ML research loop, hosted on managed GPUs. Edit program.md, hit start, check results in the morning.
Describe your research goals in plain English. The program file tells the agent what to explore and how to evaluate results.
We provision an RTX 4090 on RunPod, fetch the upstream autoresearch code at a pinned commit, and begin. The LLM agent iterates autonomously — no babysitting required.
Live val_bpb chart, results.tsv, and full diffs as the agent works. Download the complete workspace when done — it's replayable locally with uv run train.py.
AutoKarp runs Karpathy's autoresearch project on managed GPUs. Here's exactly what happens when you hit start.
We vendor autoresearch at commit 228791fb. train.py, prepare.py, pyproject.toml, and uv.lock come from that exact SHA — no forks, no patches. The agent loop approximates the upstream workflow using the same constraints and evaluation.
Every run uses an NVIDIA RTX 4090 on RunPod. Results are hardware-specific — val_bpb may differ if you replay on a different GPU locally.
prepare.py, pyproject.toml, and uv.lock are locked. The LLM agent only modifies train.py. You control program.md — the instructions that guide the agent.
Completed runs produce a tarball with the full .git/ history, program.md, train.py, results.tsv, and all logs. cd in and run uv run train.py to reproduce locally.
5 runs
15 runs/mo
45 runs/mo