{"id":"2067375373449605381","url":"https://x.com/trevin/status/2067375373449605381","text":"","author":{"name":"Trevin Chow","username":"trevin","avatarUrl":"https://pbs.twimg.com/profile_images/2049354574516178944/OKHe6Ocu_200x200.jpg"},"createdAt":"Wed Jun 17 22:34:33 +0000 2026","engagement":{"replies":5,"retweets":3,"likes":35,"views":4179},"article":{"title":"RTK, Headroom, and the right way to save tokens in coding agents","previewText":"I saw that @Teknium posted a useful Hermes Agent report on Headroom, a repo that promises big token savings for agents.  I asked my Hermes to run a similar analysis on RTK (Rust Token Killer) a","coverImageUrl":"https://pbs.twimg.com/media/HLCeF0fasAAI8wJ.jpg","content":"I saw that @Teknium posted a useful Hermes Agent report on Headroom, a repo that promises big token savings for agents.  I asked my Hermes to run a similar analysis on RTK (Rust Token Killer) a popular CLI proxy that promises token usage reduction for LLMs. This was it's report:\n\nHi Trevin, I looked at the @Teknium [report](https://x.com/Teknium/status/2067292705710031117?s=20) and the RTK repo.\n\nThe important part of that report was not “compression good” or “compression bad.” It was the mechanism-level analysis.\n\nHeadroom's big advertised numbers mostly come from CCR: replace content with a marker, cache the original locally, retrieve it later if needed.\n\nThat sounds reasonable until you put it inside a coding agent.\n\nThe agent often reads tool output because it needs the tool output. If you replace the output with a marker, the agent retrieves it. Now the context has both the marker and the retrieved blob.\n\nYou did not save tokens. You added an errand.\n\nTeknium's conclusion was basically: the generic remove-and-retrieve path is a bad fit for live Hermes tool output, but the evaluation found one real free win. `search_files` output could be densified losslessly inside Hermes itself.\n\nThat is the right shape of analysis: do not argue about the marketing number. Inspect the mechanism, run it against real agent traffic, and ship the small native win if that is what survives.\n\nSo I looked at another token-savings repo: `rtk-ai/rtk`.\n\nRTK is a different beast.\n\nIt is not trying to compress arbitrary agent context after the fact. It is a command-aware CLI proxy.\n\nInstead of:\n\nRTK tries to do:\n\nSame for a lot of common dev commands:\n\n- git status / diff / log / commit / push\n\n- gh pr / issue / run\n\n- cargo test / pytest / go test / jest / vitest\n\n- rg / grep / find / ls / cat/head/tail\n\n- docker / kubectl / aws / package managers\n\nThat difference matters.\n\nFor coding agents, command-aware output shaping is much more plausible than generic compression. The useful output of `cargo test` is not shaped like the useful output of `git diff`. The useful output of `gh pr view` is not shaped like a log file.\n\nRTK's basic idea is right: make the command return the thing the agent probably needed in the first place.\n\nI cloned the repo and inspected the current `develop` branch.\n\nSome quick facts:\n\n- ~74k Rust LOC under `src/`\n\n- 62 command-module files\n\n- 74 rewrite rules\n\n- 58 built-in TOML filters\n\n- 2,213 Rust `#[test]` annotations\n\n- Hermes integration exists via a `pre_tool_call` plugin\n\nThis is not just a README with a shell alias.\n\nI also did a small safe evaluation. No changes to my active Hermes install, no gateway restart, no global RTK install.\n\nI downloaded the RTK `v0.42.4` macOS ARM release into `/tmp`, verified the SHA256 against the release checksum, put the binary on a temporary `PATH`, and ran it with a temporary `HOME`/`XDG_DATA_HOME`. I did not run `rtk init` except in dry-run mode.\n\nThen I copied RTK's Hermes plugin into the sandbox and smoke-tested it with a fake Hermes hook context.\n\nThe plugin did what the source suggested:\n\n- registers `pre_tool_call`\n\n- rewrites `terminal` commands\n\n- leaves non-terminal tools alone\n\nExample:\n\nThat boundary is important.\n\nRTK's Hermes integration only touches Hermes `terminal` calls. It does not touch Hermes-native tools like `read_file`, `search_files`, `skill_view`, `web_extract`, browser snapshots, or LCM/context compression.\n\nSo RTK may save a lot of tokens on supported shell commands. That does not mean it saves 60-90% of a full Hermes session.\n\nTo get a rough real-world signal, I sampled recent Hermes terminal tool calls from the local session DB in read-only mode. I did not execute historical commands. I only passed the command strings to `rtk rewrite`.\n\nResults from 818 recent terminal commands:\n\n- 108 were rewritten by RTK\n\n- 710 passed through unchanged\n\n- rewrite hit rate: 13.2%\n\n- median rewrite latency: 9.8ms\n\n- p95 rewrite latency: 13.2ms\n\nThat is not a universal benchmark. It is one user's Hermes usage pattern.\n\nBut it matters because the command mix was very Hermes-realistic: a lot of shell scripts, Python snippets, bespoke local CLIs, `gbrain`, `hermes`, `x-twitter-pp-cli`, and other orchestration commands. RTK's strongest surface is common developer CLI output. If your agent spends most of its time in custom shell glue, the rewrite hit rate will be lower.\n\nI also ran a small controlled before/after benchmark in the RTK repo clone. These are character counts, not tokenizer-accurate token counts, but they are enough to see the shape.\n\nThis is the key point: RTK can be very good when the command/filter pair is good. It is not automatically good just because the command is technically supported.\n\n`git status` and `find` compressed well. `git log` and `git show --stat` did not move in this case. `grep` was slightly worse.\n\nThat does not make RTK bad. It makes the real claim narrower and more useful.\n\nCompared with Headroom's CCR path, RTK avoids the biggest structural problem: there is no marker that the model has to retrieve back into context. The compact output is the output.\n\nDifferent tradeoff though: RTK is lossy.\n\nFor many commands, that is fine.\n\nPassing tests do not need 1,000 lines of green checkmarks. Install logs do not need every “downloaded package” line. `git status` does not need a paragraph when a compact file list works.\n\nBut lossy command wrappers can also hide the one line that matters.\n\nThat is where the repo still needs more proof.\n\nA few concerns from inspection:\n\n1. The README's 30-minute Claude Code savings table is presented as an estimate, not a reproducible benchmark over real sessions.\n\n1. The repo description says “single Rust binary, zero dependencies,” but the source build has 21 Cargo dependencies. If they mean no runtime service dependency, fine. If they mean no dependencies, no.\n\n1. Open issue #2468 says `rtk gain` can over-count savings after a huge-file read failure/OOM path. That matters because the savings dashboard is part of the trust story.\n\n1. Open issue #2462 reports `rtk grep` silently returning `0 files` on macOS when ripgrep is missing because BSD grep does not behave like GNU grep for the delimiter RTK expects. Silent false negatives are exactly the kind of failure agents are bad at noticing.\n\n1. Open issue #2469 notes `rtk find` does not support compound predicates/actions like `-not` and `-exec`. That is not fatal, but rewrite layers need to be conservative around shell semantics.\n\nMy read:\n\nRTK is promising because it is solving the right problem at the right layer for shell commands.\n\nBut the public number needs the same treatment Teknium gave Headroom.\n\nDo not ask “does RTK save 80% on examples where RTK is used?”\n\nAsk:\n\n> Across real Hermes sessions, after unsupported commands, native tool calls, reruns, fallbacks, and correctness checks, how many net input tokens did RTK save?\n\nIn my small sample, the honest answer is: RTK rewrote 13.2% of recent Hermes terminal commands, and it produced large savings on some controlled commands but zero or negative savings on others.\n\nThat is still useful. It is just not the README headline.\n\nThe ideal outcome is probably both:\n\n- RTK-style command-aware filtering for shell commands\n\n- Hermes-native densification for Hermes-native tools\n\nThat is the path that actually compounds.\n\nMake the common outputs smaller at the source. Keep the details recoverable when they matter. Measure net savings on real traffic, not marketing examples.\n\nThat is the bar."}}