Can coding agents stay coherent over a 1 billion token budget?
Can they build Slack from scratch?
Rewrite a JAX codebase in PyTorch?
Build a C compiler in Rust?
Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.

Can coding agents stay coherent over a 1 billion token budget?
Can they build Slack from scratch?
Rewrite a JAX codebase in PyTorch?
Build a C compiler in Rust?
Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.

Create a free account to save everything you preview — private to you.
Preview another link
Works with X, Instagram, TikTok & YouTube.
Create a free account to save everything you preview — private to you.
Preview another link
Works with X, Instagram, TikTok & YouTube.