Found something good?

Save it before you doomscroll past it.

Seeing a number of benchmarks showing Opus is the best model for long-running work.

Five tips for running Opus autonomously for hours/days:

1. Use auto mode for permissions, so Claude doesn’t ask for approval
2. Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done
3. Use /goal or /loop, to nudge Claude to keep going until it’s done
4. Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app)
5. Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work

Rishi DesaiRishi Desai@rishi_desai2

Can coding agents stay coherent over a 1 billion token budget? Can they build Slack from scratch? Rewrite a JAX codebase in PyTorch? Build a C compiler in Rust? Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.

Tweet media
2421842.4K375.0K
Keep it forever

Create a free account to save everything you preview — private to you.

Preview another link

Works with X, Instagram, TikTok & YouTube.

One place for everything
Tweets, TikToks, Reels, Shorts & articles in one searchable home.
Media at your fingertips
Full-screen viewer for photos and video — save any post to your collection.
Actually find it later
Full-text search across everything you save.
@bcherny: "Seeing a number of benchmarks showing Opus is the…" | ADHX