AGENTS dot md files don't scale beyond modest codebases.
Lots of discussions on this lately.
If you're building serious software with Claude Code or any agentic tool, a single AGENTS dot md will eventually fail you. This paper shows what comes next.
A 1,000-line prototype can be fully described in a single prompt. A 100,000-line system cannot. The AI must be told, repeatedly and reliably, how the project works, what patterns to follow, and what mistakes to avoid.
Single-file manifests hit a ceiling fast.
This new paper, Codified Context, documents a three-tier infrastructure built during real development of a 108,000-line C# distributed system across 283 sessions over 70 days.
The system uses a three-tier memory architecture: a hot-memory constitution (660 lines, always loaded), 19 specialized domain-expert agents (9,300 lines total) invoked per task, and a cold-memory knowledge base of 34 specification documents (~16,250 lines) queried on demand via an MCP retrieval server.
Across 283 sessions, this produced 2,801 human prompts, 1,197 agent invocations, and 16,522 autonomous agent turns, roughly 6 autonomous turns per human prompt, with a knowledge-to-code ratio of 24.2%.
Crucially, none of it was designed upfront: each new agent and specification emerged from a real failure, a recurring bug, an architectural mistake, a convention forgotten, and was codified so it could never require re-explanation again, turning documentation into load-bearing infrastructure that agents depend on as memory, not reference.
Paper: https://arxiv.org/abs/2602.20478
Learn to build effective AI agents in our academy: https://academy.dair.ai/


