AVB

@neural_avb

The Claude Code Harness - Everything you need to know

So the claude code repo leaked. Sad day for Anthropic, but great day for AI engineers coz we finally get to learn from one of the greatest coding harnesses ever put together. So let's unpack!

I asked Codex to browse through the repo and write a .md file containing all it's learnings. This article contains the main design takeaways. At the end of the article, there are some prompts CC uses. I also asked Codex to compare it's own tools with what Claude Code has, and those results were also very illuminating.

According to Codex, what makes Claude Code good is not one prompt or one model trick. It is the combination of:

A strong system prompt with explicit operational norms.

A dedicated tool surface instead of overusing shell.

A real execution harness around the model: permissions, hooks, checks, retries, compaction, and telemetry.

Careful prompt-cache-aware engineering.

Subagents/forks that let the main thread stay clean and responsive.

Context-management systems that preserve continuity without letting long sessions fall apart.

The core design pattern is: treat the model as one component inside a stateful agent runtime, not as the runtime itself

The System Prompt

The system prompt is assembled from many sections instead of being one monolith. Meaning the system prompt is constructed by assembling multiple snippets and stitching them up symbolically.

There is explicit prompt priority logic: override prompt, coordinator prompt, agent prompt, custom prompt, default prompt.

The code repeatedly avoids adding per-session variability to the static prefix because that would destroy prompt cache efficiency.

It means they are optimizing for two things at once:

1. different kinds of sessions need different instructions
2. they do not want to pay the full token cost every turn

They use two interesting functionalities: DANGEROUS_uncachedSystemPromptSection and SYSTEM_PROMPT_DYNAMIC_BOUNDARY.

They split the prompt into a stable front half and a dynamic back half so they can customize behavior without ruining prompt caching.

What DANGEROUS_uncachedSystemPromptSection(...) means:

- “This section changes often, so don’t cache it blindly.”
- They call it dangerous because changing prompt text breaks cache reuse.

What SYSTEM_PROMPT_DYNAMIC_BOUNDARY means:
- “Everything before this line should stay stable."
- “Everything after this line is allowed to vary per session.”

Just remember than hitting caches is great coz it allows LLM Providers to reuse old computation and serve you responses at a subsidized price. If you have a long chat with Opus, and switch to Haiku with a 800K token context, the cost of that one Haiku invocation is likely more than just continuing with Opus. This is coz Opus KV Caches for your conversation was already cached. But they have to recompute your tokens for Haiku. So it is within their incentive to be super meticulous about prompt building stuff so that caches get hit as much as possible and they can serve Claude models to you at higher volume

TLDR: They split the prompt into a stable front half and a dynamic back half so they can customize behavior without ruining prompt caching.

The Tools

The default prompt teaches operating style: read files before editing, use dedicated tools over shell, be concise, confirm risky actions, avoid blind retries, and report outcomes faithfully.

Tool prompts are used to shape behavior at the tool level, not just the global level.

Example: the edit tool requires a prior read and teaches exact-string replacement discipline.

Example: the bash tool prompt pushes the model toward shell only when dedicated tools are insufficient.

Example: the agent tool prompt teaches when to delegate, when to fork, and how to write a proper subagent brief.

Example: the AskUserQuestion prompt constrains how the model asks for clarification so it fits the product UX.

The harness does not rely on one “master prompt”. They distribute behavior constraints across the exact surfaces where mistakes happen.

Claude Code exposes purpose-built tools for common actions instead of forcing everything through shell.

- `tools.ts` is effectively the source of truth for the available tool pool.

- Tools can be filtered by permissions before the model even sees them.

- Some tools are deferred behind `ToolSearchTool`, which reduces prompt size and preserves cache efficiency.

- Tool calls are partitioned into concurrency-safe and non-concurrency-safe batches.

- Read-only or safe calls can run in parallel; mutating calls are serialized.

- Streaming tool execution preserves ordering for user-visible results while still allowing background parallel execution.

- Context modifications from tool execution are queued and applied carefully so parallel reads do not corrupt runtime state.

- Bash sibling cancellation is explicitly handled so one failed shell call can abort related concurrent shell work.

- Many agent demos stop at “model emits tool call”. Claude Code handles the harder problem: how tool calls behave in a long-running, stateful, partially parallel runtime.

Here are the tools Codex found:

Then I asked Codex to write about tools that it has but CC doesn't, and tools that CC has and Codex doesn't.

Hooks

Hooks are things that runs before or after (pre or post) the actual LLM call.

- There are shell-command hooks, callback hooks, HTTP hooks, prompt hooks, and agent hooks.

- Session-scoped hooks can be registered dynamically from agent frontmatter or skill frontmatter.

- Agent frontmatter `Stop` hooks are remapped to `SubagentStop`, which is a subtle but excellent detail.

“Frontmatter” means metadata written at the top of a file, usually inside a block like:

In this codebase, frontmatter is used to define things like agent metadata, skill metadata, allowed/disallowed tools, model choice etc.

- Stop hooks are integrated into the query loop and can block continuation, add context, or generate structured summaries.

- Internal post-sampling hooks are used for systems like session memory.

Indexing

This note summarizes what I could find in the source about how Claude Code handles indexing for a new codebase.

Claude Code does appear to build an index, but the clearly visible built-in index is a fast file-path fuzzy index, not a full semantic code graph.

So if you open a brand new repository, the evidence in this source suggests Claude Code primarily does this:

1. collect file paths
2. collect directory paths
3. build an in-memory fuzzy index over those paths
4. refresh that index when the repo changes

The clearest indexing implementation is here
`src/native-ts/file-index/index.ts`

That file defines a `FileIndex` with APIs like:

- `new FileIndex()`
- `loadFromFileList(fileList: string[])`
- `loadFromFileListAsync(fileList: string[])`
- `search(query: string, limit: number)`

How A New Codebase Likely Gets Indexed

Step 1: collect project files

Claude Code first tries:
- `git ls-files`

If that does not work, it falls back to:
- `ripgrep --files`

Step 2: include config files

It also loads Claude config markdown files from config directories.

Step 3: derive directory entries

It computes parent directories from the file list so folders can also be searched/suggested.

Step 4: build the fuzzy index

It creates or reuses a singleton `FileIndex` This build is async and chunked so the UI stays responsive.

Step 5: refresh when the repo changes

It tracks git index mtime and throttles refreshes.

So this is not a one-time permanent index database. It looks more like an **in-memory, refreshable working index** for path suggestions.

What This Index Is Good For

This built-in index is useful for:

- fuzzy file lookup
- path completion
- fast navigation in large repos

It is optimized for responsiveness and search quality, not deep semantic understanding.

What It Does Not Obviously Do

From the visible source, Codex does not see a built-in default system that clearly constructs:

- a symbol graph
- a call graph
- embeddings over code
- a semantic search database
- a repo-wide AST index used as the main navigation substrate

Where Deeper Indexing Might Come From

Claude Code also has files to access LSP as well as other MCP servers.
This suggests Claude Code can rely on language servers for:
- definitions
- references
- symbol information
- hover info

Compaction is a whole subsystem

Compaction is the task where the conversation has gone too long and it's time for them to summarize the chat so that you can continue the conversation without overflowing the model's context window.

- The query loop proactively considers microcompact, context collapse, autocompact, and reactive compact.

- Compaction is deeply integrated with token accounting, retries, error handling, and analytics.

- There is special handling for prompt-too-long failure modes, including retry logic that drops older context strategically.

- Compact boundaries are represented as explicit messages in the transcript.

- After compaction, the system rebuilds the post-compact message set in a structured way rather than just replacing everything with one summary blob.

Subagents

According to Codex, Claude Code has a serious subagent system. It is not just “spawn another model call.”

Subagents in Claude Code are:
- first-class tools
- configurable by type
- permission-aware
- tool-restricted
- integrated with the main query loop
- aware of context, prompt caching, and session state

There are also two different ideas mixed into the design: - fresh specialized agents with their own prompt and tool restrictions - forked agents that inherit the parent’s context and are optimized for prompt-cache reuse

Core Idea

Claude Code exposes subagents through the `Agent` tool.

That means the main model can explicitly decide:
- whether to delegate
- what kind of agent to spawn
- whether to spawn a specialized fresh agent
- whether to fork itself
- whether to run the agent in background
- whether to isolate the agent in a worktree or remote environment

This is much more structured than “just call another model.”

Two Main Kinds Of Subagents

1. Fresh specialized agents

These are agents started with a `subagent_type`.

Examples in the source:
- `Explore`
- `Plan`
- `general-purpose`
- `verification`

These agents can have:

- their own system prompt
- their own allowed/disallowed tools
- their own model preference
- their own frontmatter-defined behavior
- their own MCP server requirements

These agents do **not** rely on vague delegation. Claude Code’s prompt explicitly tells the parent agent to write a proper brief for them.

2. Forked agents

When forking is enabled, calling `Agent` without a `subagent_type` creates a fork.

Forks are different from fresh specialized agents because they:

- inherit the parent conversation context
- are designed to share prompt cache
- keep the main thread cleaner by offloading noisy intermediate work

This is a very important design choice. Claude Code is not just using subagents for intelligence; it is also using them for context hygiene.

Tool Restriction Model

One of the strongest things about Claude Code’s subagent system is that agents are not just prompt variants. They can have real tool constraints

- `Explore` disallows edit/write/notebook-edit/agent nesting - `Plan` does the same - agent definitions can specify allowed tools and disallowed tools

This matters because it turns agents into actual roles, not just moods.

Subagents can receive: - a base/default agent prompt - an agent-specific system prompt - environment notes - runtime-specific additions

So subagents have their own prompt stack, not merely the parent prompt copied blindly.

What The Main Prompt Says About Subagents

The agent tool prompt is one of the most educational parts of the system. -- Codex

The file is `src/tools/AgentTool/prompt.ts`

Important ideas from that prompt:
- use subagents when the task matches an agent description
- do not overuse them
- do not duplicate work already delegated
- forks are useful when intermediate output is not worth keeping in the main context
- fresh agents need a full brief because they start with zero context
- the parent should not fabricate results from a fork that has not finished yet

Claude Code is teaching the model not only how to delegate, but also how to avoid bad delegation patterns.

Notes about each subagent

Explore agent

- explicitly read-only
- optimized for searching codebases
- instructed to use search/read tools efficiently
- forbidden from file modifications
- encouraged to run parallel search/read operations

This is basically a specialized codebase reconnaissance agent.

2. Plan agent

- explicitly read-only
- focused on designing implementation plans
- forbidden from editing files
- expected to identify critical files and implementation sequence

This agent is not meant to code. It is meant to think structurally. -- Codex

3. Verification agent

Mentioned in the prompt and built-in agent set, this reflects another important idea:

- implementation and verification can be separated
- the main reporting agent should not be the only judge of correctness

That is a strong anti-self-deception design choice. -- Codex

Claude Code vs Codex Harness

I asked Codex to tell me how CC compares with Codex. Here is the file it produced copy pasted. All of this section is Codex generated.

This section compares the Claude Code harness in this repository with the Codex-style harness I am currently operating inside for this session. This is an objective architectural comparison, not a judgment of which is universally better.

- Claude Code is a **product runtime** with a large amount of in-product orchestration: hooks, permission modes, compact systems, memory systems, session metadata, dynamic tool surfaces, and user-facing workflow control.

- The Codex harness is closer to a **general agent execution shell**: simpler outer runtime, stronger turn-by-turn coding discipline, explicit developer-tool APIs, and less product-specific lifecycle machinery.

1. Claude Code is more productized; Codex is more execution-oriented

Claude Code’s source shows a harness designed for a polished end-user product:

- rich permission UX
- slash-command and skill model
- session-scoped hooks
- UI-oriented reminders and init messages
- long-session continuity systems
- feature-gated runtime variants

The Codex harness I’m using is more execution-centered:

- explicit tool namespaces
- stronger directness around making edits, running inspections, and reporting outcomes
- less emphasis on in-band product UX features like plan-mode approvals, skill surfacing, or session memory

Claude Code feels designed to be a durable interactive environment. Codex feels designed to be a strong software execution agent inside a simpler shell.

2. Claude Code invests more in runtime adaptation; Codex invests more in instruction discipline

Claude Code compensates for model/runtime complexity with many adaptive subsystems:
- compaction layers
- session memory
- hook pipelines
- permission classifiers
- tool deferral
- MCP instruction deltas

The Codex harness relies more heavily on:
- strong developer instructions
- explicit tool affordances
- clear editing constraints
- simpler stepwise execution discipline

Put differently: - Claude Code says: “build a lot of runtime machinery around the model.” - Codex says: “make the agent follow a stricter operating contract.”

Both are valid. They sit at different points on the runtime-versus-policy spectrum.

Subagent architectures

According to Codex:

Claude Code treats subagents as product-level roles exposed through one main `Agent` tool. Codex treats subagents as explicit harness-controlled worker processes exposed through multiple lifecycle tools. That is the strongest difference.

Claude Code Subagents

Claude Code exposes subagents mainly through a single high-level tool:

- `Agent`

The parent model says what kind of agent it wants, and Claude Code handles the rest inside the product runtime.

Codex Subagents

In this harness, subagents are exposed more explicitly through separate tools:

- `spawn_agent`
- `send_input`
- `wait_agent`
- `resume_agent`
- `close_agent`

Core model

Codex does not wrap subagents inside a single product abstraction like `Agent`.

Instead, it exposes the subagent lifecycle directly.

- spawning is explicit
- agent IDs are explicit
- waiting is explicit
- resuming is explicit
- closing is explicit
- you can route messages to an already-running subagent directly
- the harness expects the main agent to manage coordination more manually

Claude Code says: “Here is one high-level delegation tool. Think in terms of agent roles. Let the runtime handle much of the structure.”

So the parent model mostly thinks in product terms:

- use `Explore`
- use `Plan`
- fork for research
- background this task

Codex says: “Here are the primitives for managing subagents. You coordinate them explicitly. The harness gives you direct lifecycle control.”

So the parent agent thinks more in operator terms:

- spawn this worker
- send more input
- wait for completion
- close when done

Pros and cons through an objective lens (Codex written)

Claude Code is a heavier, more productized agent operating system. Codex is a leaner, stricter coding-agent harness. -- Codex 😭

Dump of prompts

Important files

X Article

Found something good?