RTK, Headroom, and the right way to save tokens in coding agents

I saw that @Teknium posted a useful Hermes Agent report on Headroom, a repo that promises big token savings for agents. I asked my Hermes to run a similar analysis on RTK (Rust Token Killer) a popular CLI proxy that promises token usage reduction for LLMs. This was it's report:
Hi Trevin, I looked at the @Teknium report and the RTK repo.
The important part of that report was not “compression good” or “compression bad.” It was the mechanism-level analysis.
Headroom's big advertised numbers mostly come from CCR: replace content with a marker, cache the original locally, retrieve it later if needed.
That sounds reasonable until you put it inside a coding agent.
The agent often reads tool output because it needs the tool output. If you replace the output with a marker, the agent retrieves it. Now the context has both the marker and the retrieved blob.
You did not save tokens. You added an errand.
Teknium's conclusion was basically: the generic remove-and-retrieve path is a bad fit for live Hermes tool output, but the evaluation found one real free win. `search_files` output could be densified losslessly inside Hermes itself.
That is the right shape of analysis: do not argue about the marketing number. Inspect the mechanism, run it against real agent traffic, and ship the small native win if that is what survives.
So I looked at another token-savings repo: `rtk-ai/rtk`.
RTK is a different beast.
It is not trying to compress arbitrary agent context after the fact. It is a command-aware CLI proxy.
Instead of:
RTK tries to do:
Same for a lot of common dev commands:
That difference matters.
For coding agents, command-aware output shaping is much more plausible than generic compression. The useful output of `cargo test` is not shaped like the useful output of `git diff`. The useful output of `gh pr view` is not shaped like a log file.
RTK's basic idea is right: make the command return the thing the agent probably needed in the first place.
I cloned the repo and inspected the current `develop` branch.
Some quick facts:
This is not just a README with a shell alias.
I also did a small safe evaluation. No changes to my active Hermes install, no gateway restart, no global RTK install.
I downloaded the RTK `v0.42.4` macOS ARM release into `/tmp`, verified the SHA256 against the release checksum, put the binary on a temporary `PATH`, and ran it with a temporary `HOME`/`XDG_DATA_HOME`. I did not run `rtk init` except in dry-run mode.
Then I copied RTK's Hermes plugin into the sandbox and smoke-tested it with a fake Hermes hook context.
The plugin did what the source suggested:
Example:
That boundary is important.
RTK's Hermes integration only touches Hermes `terminal` calls. It does not touch Hermes-native tools like `read_file`, `search_files`, `skill_view`, `web_extract`, browser snapshots, or LCM/context compression.
So RTK may save a lot of tokens on supported shell commands. That does not mean it saves 60-90% of a full Hermes session.
To get a rough real-world signal, I sampled recent Hermes terminal tool calls from the local session DB in read-only mode. I did not execute historical commands. I only passed the command strings to `rtk rewrite`.
Results from 818 recent terminal commands:
That is not a universal benchmark. It is one user's Hermes usage pattern.
But it matters because the command mix was very Hermes-realistic: a lot of shell scripts, Python snippets, bespoke local CLIs, `gbrain`, `hermes`, `x-twitter-pp-cli`, and other orchestration commands. RTK's strongest surface is common developer CLI output. If your agent spends most of its time in custom shell glue, the rewrite hit rate will be lower.
I also ran a small controlled before/after benchmark in the RTK repo clone. These are character counts, not tokenizer-accurate token counts, but they are enough to see the shape.
This is the key point: RTK can be very good when the command/filter pair is good. It is not automatically good just because the command is technically supported.
`git status` and `find` compressed well. `git log` and `git show --stat` did not move in this case. `grep` was slightly worse.
That does not make RTK bad. It makes the real claim narrower and more useful.
Compared with Headroom's CCR path, RTK avoids the biggest structural problem: there is no marker that the model has to retrieve back into context. The compact output is the output.
Different tradeoff though: RTK is lossy.
For many commands, that is fine.
Passing tests do not need 1,000 lines of green checkmarks. Install logs do not need every “downloaded package” line. `git status` does not need a paragraph when a compact file list works.
But lossy command wrappers can also hide the one line that matters.
That is where the repo still needs more proof.
A few concerns from inspection:
My read:
RTK is promising because it is solving the right problem at the right layer for shell commands.
But the public number needs the same treatment Teknium gave Headroom.
Do not ask “does RTK save 80% on examples where RTK is used?”
Ask:
Across real Hermes sessions, after unsupported commands, native tool calls, reruns, fallbacks, and correctness checks, how many net input tokens did RTK save?
In my small sample, the honest answer is: RTK rewrote 13.2% of recent Hermes terminal commands, and it produced large savings on some controlled commands but zero or negative savings on others.
That is still useful. It is just not the README headline.
The ideal outcome is probably both:
That is the path that actually compounds.
Make the common outputs smaller at the source. Keep the details recoverable when they matter. Measure net savings on real traffic, not marketing examples.
That is the bar.

