{"id":"2038982104445538595","url":"https://x.com/neural_avb/status/2038982104445538595","text":"","author":{"name":"AVB","username":"neural_avb","avatarUrl":"https://pbs.twimg.com/profile_images/2015375309147611136/WKvfQ-oV_200x200.jpg"},"createdAt":"Tue Mar 31 14:09:50 +0000 2026","engagement":{"replies":5,"retweets":19,"likes":252,"views":38190},"article":{"title":"The Claude Code Harness - Everything you need to know","previewText":"So the claude code repo leaked. Sad day for Anthropic, but great day for AI engineers coz we finally get to learn from one of the greatest coding harnesses ever put together. So let's unpack!\nI asked","coverImageUrl":"https://pbs.twimg.com/media/HEvrQ5Mb0AE7jWV.jpg","content":"So the claude code repo leaked. Sad day for Anthropic, but great day for AI engineers coz we finally get to learn from one of the greatest coding harnesses ever put together. So let's unpack!\n\nI asked Codex to browse through the repo and write a .md file containing all it's learnings. This article contains the main design takeaways. At the end of the article, there are some prompts CC uses. I also asked Codex to compare it's own tools with what Claude Code has, and those results were also very illuminating.\n\nAccording to Codex, what makes Claude Code good is not one prompt or one model trick. It is the combination of:\n\n- A strong system prompt with explicit operational norms.\n\n- A dedicated tool surface instead of overusing shell.\n\n- A real execution harness around the model: permissions, hooks, checks, retries, compaction, and telemetry.\n\n- Careful prompt-cache-aware engineering.\n\n- Subagents/forks that let the main thread stay clean and responsive.\n\n- Context-management systems that preserve continuity without letting long sessions fall apart.\n\n> The core design pattern is: treat the model as one component inside a stateful agent runtime, not as the runtime itself\n\n## The System Prompt\n\nThe system prompt is assembled from many sections instead of being one monolith. Meaning the system prompt is constructed by assembling multiple snippets and stitching them up symbolically.\n\nThere is explicit prompt priority logic: override prompt, coordinator prompt, agent prompt, custom prompt, default prompt.\n\nThe code repeatedly avoids adding per-session variability to the static prefix because that would destroy prompt cache efficiency.\n\nIt means they are optimizing for two things at once:\n\n  1. different kinds of sessions need different instructions\n  2. they do not want to pay the full token cost every turn\n\n> They use two interesting functionalities: DANGEROUS_uncachedSystemPromptSection and SYSTEM_PROMPT_DYNAMIC_BOUNDARY.\n\nThey split the prompt into a stable front half and a dynamic back half so they can customize behavior without ruining prompt caching.\n\n  What DANGEROUS_uncachedSystemPromptSection(...) means:\n\n  - “This section changes often, so don’t cache it blindly.”\n  - They call it dangerous because changing prompt text breaks cache reuse.\n\n  What SYSTEM_PROMPT_DYNAMIC_BOUNDARY means:\n  - “Everything before this line should stay stable.\"\n- “Everything after this line is allowed to vary per session.”\n\n> Just remember than hitting caches is great coz it allows LLM Providers to reuse old computation and serve you responses at a subsidized price.\n\nIf you have a long chat with Opus, and switch to Haiku with a 800K token context, the cost of that one Haiku invocation is likely more than just continuing with Opus.\n\nThis is coz Opus KV Caches for your conversation was already cached. But they have to recompute your tokens for Haiku.\n\nSo it is within their incentive to be super meticulous about prompt building stuff so that caches get hit as much as possible and they can serve Claude models to you at higher volume\n\nTLDR: They split the prompt into a stable front half and a dynamic back half so they can customize behavior without ruining prompt caching.\n\n# The Tools\n\nThe default prompt teaches operating style: read files before editing, use dedicated tools over shell, be concise, confirm risky actions, avoid blind retries, and report outcomes faithfully.\n\nTool prompts are used to shape behavior at the tool level, not just the global level. \n\n> Example: the edit tool requires a prior read and teaches exact-string replacement discipline.\n\n> Example: the bash tool prompt pushes the model toward shell only when dedicated tools are insufficient.\n\n> Example: the agent tool prompt teaches when to delegate, when to fork, and how to write a proper subagent brief.\n\n> Example: the AskUserQuestion prompt constrains how the model asks for clarification so it fits the product UX.\n\nThe harness does not rely on one “master prompt”. They distribute behavior constraints across the exact surfaces where mistakes happen.\n\n> Claude Code exposes purpose-built tools for common actions instead of forcing everything through shell.\n\n- `tools.ts` is effectively the source of truth for the available tool pool.\n\n- Tools can be filtered by permissions before the model even sees them.\n\n- Some tools are deferred behind `ToolSearchTool`, which reduces prompt size and preserves cache efficiency.\n\n- Tool calls are partitioned into concurrency-safe and non-concurrency-safe batches.\n\n- Read-only or safe calls can run in parallel; mutating calls are serialized.\n\n- Streaming tool execution preserves ordering for user-visible results while still allowing background parallel execution.\n\n- Context modifications from tool execution are queued and applied carefully so parallel reads do not corrupt runtime state.\n\n- Bash sibling cancellation is explicitly handled so one failed shell call can abort related concurrent shell work.\n\n- Many agent demos stop at “model emits tool call”. Claude Code handles the harder problem: how tool calls behave in a long-running, stateful, partially parallel runtime.\n\nHere are the tools Codex found:\n\nThen I asked Codex to write about tools that it has but CC doesn't, and tools that CC has and Codex doesn't.\n\n# Hooks\n\nHooks are things that runs before or after (pre or post) the actual LLM call. \n\n- There are shell-command hooks, callback hooks, HTTP hooks, prompt hooks, and agent hooks.\n\n- Session-scoped hooks can be registered dynamically from agent frontmatter or skill frontmatter.\n\n- Agent frontmatter `Stop` hooks are remapped to `SubagentStop`, which is a subtle but excellent detail.\n\n“Frontmatter” means metadata written at the top of a file, usually inside a block like:\n\nIn this codebase, frontmatter is used to define things like agent metadata, skill metadata, allowed/disallowed tools, model choice etc.\n\n- Stop hooks are integrated into the query loop and can block continuation, add context, or generate structured summaries.\n\n- Internal post-sampling hooks are used for systems like session memory.\n\n# Indexing\n\nThis note summarizes what I could find in the source about how Claude Code handles indexing for a new codebase.\n\n> Claude Code does appear to build an index, but the clearly visible built-in index is a fast file-path fuzzy index, not a full semantic code graph.\n\nSo if you open a brand new repository, the evidence in this source suggests Claude Code primarily does this:\n\n1. collect file paths\n2. collect directory paths\n3. build an in-memory fuzzy index over those paths\n4. refresh that index when the repo changes\n\nThe clearest indexing implementation is here\n`src/native-ts/file-index/index.ts`\n\nThat file defines a `FileIndex` with APIs like:\n\n- `new FileIndex()`\n- `loadFromFileList(fileList: string[])`\n- `loadFromFileListAsync(fileList: string[])`\n- `search(query: string, limit: number)`\n\nHow A New Codebase Likely Gets Indexed\n\n- Step 1: collect project files\n\nClaude Code first tries:\n- `git ls-files`\n\nIf that does not work, it falls back to:\n- `ripgrep --files`\n\n- Step 2: include config files\n\nIt also loads Claude config markdown files from config directories.\n\n- Step 3: derive directory entries\n\nIt computes parent directories from the file list so folders can also be searched/suggested.\n\n- Step 4: build the fuzzy index\n\nIt creates or reuses a singleton `FileIndex` This build is async and chunked so the UI stays responsive.\n\n- Step 5: refresh when the repo changes\n\nIt tracks git index mtime and throttles refreshes.\n\nSo this is not a one-time permanent index database. It looks more like an **in-memory, refreshable working index** for path suggestions.\n\nWhat This Index Is Good For\n\nThis built-in index is useful for:\n\n- fuzzy file lookup\n- path completion\n- fast navigation in large repos\n\nIt is optimized for responsiveness and search quality, not deep semantic understanding.\n\nWhat It Does Not Obviously Do\n\nFrom the visible source, Codex does not see a built-in default system that clearly constructs:\n\n- a symbol graph\n- a call graph\n- embeddings over code\n- a semantic search database\n- a repo-wide AST index used as the main navigation substrate\n\nWhere Deeper Indexing Might Come From\n\nClaude Code also has files to access LSP as well as other MCP servers.\nThis suggests Claude Code can rely on language servers for:\n- definitions\n- references\n- symbol information\n- hover info\n\n# Compaction is a whole subsystem\n\nCompaction is the task where the conversation has gone too long and it's time for them to summarize the chat so that you can continue the conversation without overflowing the model's context window.\n\n- The query loop proactively considers microcompact, context collapse, autocompact, and reactive compact.\n\n- Compaction is deeply integrated with token accounting, retries, error handling, and analytics.\n\n- There is special handling for prompt-too-long failure modes, including retry logic that drops older context strategically.\n\n- Compact boundaries are represented as explicit messages in the transcript.\n\n- After compaction, the system rebuilds the post-compact message set in a structured way rather than just replacing everything with one summary blob.\n\n# Subagents\n\nAccording to Codex, Claude Code has a serious subagent system. It is not just “spawn another model call.”\n\nSubagents in Claude Code are:\n- first-class tools\n- configurable by type\n- permission-aware\n- tool-restricted\n- integrated with the main query loop\n- aware of context, prompt caching, and session state\n\n> There are also two different ideas mixed into the design:\n- fresh specialized agents with their own prompt and tool restrictions\n- forked agents that inherit the parent’s context and are optimized for prompt-cache reuse\n\nCore Idea\n\nClaude Code exposes subagents through the `Agent` tool.\n\nThat means the main model can explicitly decide:\n- whether to delegate\n- what kind of agent to spawn\n- whether to spawn a specialized fresh agent\n- whether to fork itself\n- whether to run the agent in background\n- whether to isolate the agent in a worktree or remote environment\n\nThis is much more structured than “just call another model.”\n\n## Two Main Kinds Of Subagents\n\n1. Fresh specialized agents\n\nThese are agents started with a `subagent_type`.\n\nExamples in the source:\n- `Explore`\n- `Plan`\n- `general-purpose`\n- `verification`\n\nThese agents can have:\n\n- their own system prompt\n- their own allowed/disallowed tools\n- their own model preference\n- their own frontmatter-defined behavior\n- their own MCP server requirements\n\n> These agents do **not** rely on vague delegation. Claude Code’s prompt explicitly tells the parent agent to write a proper brief for them.\n\n2. Forked agents\n\nWhen forking is enabled, calling `Agent` without a `subagent_type` creates a fork.\n\nForks are different from fresh specialized agents because they:\n\n- inherit the parent conversation context\n- are designed to share prompt cache\n- keep the main thread cleaner by offloading noisy intermediate work\n\nThis is a very important design choice. Claude Code is not just using subagents for intelligence; it is also using them for context hygiene.\n\nTool Restriction Model\n\nOne of the strongest things about Claude Code’s subagent system is that agents are not just prompt variants. They can have real tool constraints\n\n- `Explore` disallows edit/write/notebook-edit/agent nesting - `Plan` does the same - agent definitions can specify allowed tools and disallowed tools\n\nThis matters because it turns agents into actual roles, not just moods.\n\nSubagents can receive: - a base/default agent prompt - an agent-specific system prompt - environment notes - runtime-specific additions\n\nSo subagents have their own prompt stack, not merely the parent prompt copied blindly.\n\nWhat The Main Prompt Says About Subagents\n\n> The agent tool prompt is one of the most educational parts of the system.\n-- Codex\n\nThe file is `src/tools/AgentTool/prompt.ts`\n\nImportant ideas from that prompt:\n- use subagents when the task matches an agent description\n- do not overuse them\n- do not duplicate work already delegated\n- forks are useful when intermediate output is not worth keeping in the main context\n- fresh agents need a full brief because they start with zero context\n- the parent should not fabricate results from a fork that has not finished yet\n\nClaude Code is teaching the model not only how to delegate, but also how to avoid bad delegation patterns.\n\n# Notes about each subagent\n\n1. Explore agent\n\n- explicitly read-only\n- optimized for searching codebases\n- instructed to use search/read tools efficiently\n- forbidden from file modifications\n- encouraged to run parallel search/read operations\n\nThis is basically a specialized codebase reconnaissance agent.\n\n2. Plan agent\n\n- explicitly read-only\n- focused on designing implementation plans\n- forbidden from editing files\n- expected to identify critical files and implementation sequence\n\n> This agent is not meant to code. It is meant to think structurally.\n-- Codex\n\n3. Verification agent\n\nMentioned in the prompt and built-in agent set, this reflects another important idea:\n\n- implementation and verification can be separated\n- the main reporting agent should not be the only judge of correctness\n\n> That is a strong anti-self-deception design choice.\n-- Codex\n\n# Claude Code vs Codex Harness\n\nI asked Codex to tell me how CC compares with Codex. Here is the file it produced copy pasted. All of this section is Codex generated.\n\nThis section compares the Claude Code harness in this repository with the Codex-style harness I am currently operating inside for this session. This is an objective architectural comparison, not a judgment of which is universally better.\n\n> - Claude Code is a **product runtime** with a large amount of in-product orchestration: hooks, permission modes, compact systems, memory systems, session metadata, dynamic tool surfaces, and user-facing workflow control.\n\n> - The Codex harness is closer to a **general agent execution shell**: simpler outer runtime, stronger turn-by-turn coding discipline, explicit developer-tool APIs, and less product-specific lifecycle machinery.\n\n1. Claude Code is more productized; Codex is more execution-oriented\n\nClaude Code’s source shows a harness designed for a polished end-user product:\n\n- rich permission UX\n- slash-command and skill model\n- session-scoped hooks\n- UI-oriented reminders and init messages\n- long-session continuity systems\n- feature-gated runtime variants\n\nThe Codex harness I’m using is more execution-centered:\n\n- explicit tool namespaces\n- stronger directness around making edits, running inspections, and reporting outcomes\n- less emphasis on in-band product UX features like plan-mode approvals, skill surfacing, or session memory\n\nClaude Code feels designed to be a durable interactive environment. Codex feels designed to be a strong software execution agent inside a simpler shell.\n\n2. Claude Code invests more in runtime adaptation; Codex invests more in instruction discipline\n\nClaude Code compensates for model/runtime complexity with many adaptive subsystems:\n- compaction layers\n- session memory\n- hook pipelines\n- permission classifiers\n- tool deferral\n- MCP instruction deltas\n\nThe Codex harness relies more heavily on:\n- strong developer instructions\n- explicit tool affordances\n- clear editing constraints\n- simpler stepwise execution discipline\n\n> Put differently:\n- Claude Code says: “build a lot of runtime machinery around the model.”\n- Codex says: “make the agent follow a stricter operating contract.”\n\nBoth are valid. They sit at different points on the runtime-versus-policy spectrum.\n\nSubagent architectures\n\nAccording to Codex:\n\n> Claude Code treats subagents as product-level roles exposed through one main `Agent` tool. \n\nCodex treats subagents as explicit harness-controlled worker processes exposed through multiple lifecycle tools.\n\nThat is the strongest difference.\n\nClaude Code Subagents\n\nClaude Code exposes subagents mainly through a single high-level tool:\n\n- `Agent`\n\nThe parent model says what kind of agent it wants, and Claude Code handles the rest inside the product runtime.\n\nCodex Subagents\n\nIn this harness, subagents are exposed more explicitly through separate tools:\n\n- `spawn_agent`\n- `send_input`\n- `wait_agent`\n- `resume_agent`\n- `close_agent`\n\nCore model\n\nCodex does not wrap subagents inside a single product abstraction like `Agent`.\n\nInstead, it exposes the subagent lifecycle directly.\n\n- spawning is explicit\n- agent IDs are explicit\n- waiting is explicit\n- resuming is explicit\n- closing is explicit\n- you can route messages to an already-running subagent directly\n- the harness expects the main agent to manage coordination more manually\n\n> Claude Code says: “Here is one high-level delegation tool. Think in terms of agent roles. Let the runtime handle much of the structure.”\n\nSo the parent model mostly thinks in product terms:\n\n- use `Explore`\n- use `Plan`\n- fork for research\n- background this task\n\n> Codex says: “Here are the primitives for managing subagents. You coordinate them explicitly. The harness gives you direct lifecycle control.”\n\nSo the parent agent thinks more in operator terms:\n\n- spawn this worker\n- send more input\n- wait for completion\n- close when done\n\nPros and cons through an objective lens (Codex written)\n\n> Claude Code is a heavier, more productized agent operating system. Codex is a leaner, stricter coding-agent harness.\n\n-- Codex 😭\n\n# Dump of prompts\n\n## \n\n## Important files"},"adhxContext":{"savedByCount":1,"publicTags":[],"previewUrl":"https://adhx.com/neural_avb/status/2038982104445538595"}}