It's over. Karpathy just open-sourced an autonomous AI researcher that runs 100 experiments while you sleep.
You don't write the training code anymore.
You write a prompt that tells an AI agent how to think about research.
The agent edits the code, trains a small language model for exactly five minutes, checks the score, keeps or discards the result, and loops. All night. No human in the loop.
That fixed five-minute clock is the quiet genius. No matter what the agent changes, the network size, the learning rate, the entire architecture, every run gets compared on equal footing. This turns open-ended research into a game with a clear score:
- 12 experiments per hour, ~100 overnight
- Validation loss measures how well the model predicts unseen text
- Lower score wins, everything else is fair game
The agent touches one Python file containing the full training recipe. You never open it. Instead, you program a markdown file that shapes the agent's research strategy.
Your job becomes programming the programmer, and this unlocks a strange new loop:
1. Agents run real experiments without supervision
2. Prompt quality becomes the bottleneck, not researcher hours
3. Results auto-optimize for your specific hardware
4. Anyone with one GPU can run a research lab overnight
The best AI labs won't just have the most compute.
They'll have the best instructions for agents who never sleep, never forget a failed experiment, and never stop iterating.
Andrej Karpathy@karpathyI packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. https://github.com/karpathy/autoresearch Part code, part sci-fi, and a pinch of psychosis :)


