Hermes Agent FULL GUIDE: Architecture, Setup, and the Self-Improving Loop

There's a new category of AI tooling quietly taking shape: agents that don't live in a chat window you open and close, but run continuously in the cloud and talk to you through a messenger, like a coworker who never logs off.
Hermes is one of the more interesting implementations of this idea, and what sets it apart from comparable agents like OpenClaw is a built-in self-improving loop – a system that watches your conversations, extracts useful patterns from them, and turns those patterns into permanent upgrades to its own memory and skill set.
This piece walks through how Hermes is put together, how to configure it, and how that self-improvement loop actually works under the hood.
What Hermes is, and how it differs from OpenClaw
Hermes is a cloud-resident AI agent, structurally similar to OpenClaw: it runs 24/7 and you interact with it through a messaging app rather than a terminal or browser tab.
The meaningful differences are threefold.
First, Hermes ships with a much larger library of built-in skills out of the box, so you spend less time wiring up integrations yourself.
Second, the setup process is considerably more streamlined – a guided TUI handles almost everything.
Third, and most importantly, Hermes is designed around continuous self-improvement: it doesn't just execute tasks, it accumulates procedural knowledge about how to execute them better over time.
Installation and Initial Setup
Getting Hermes running takes a single command.
On Windows, you run this in PowerShell:
iex (irm https://hermes-agent.nousresearch.com/install.ps1)
On Linux, macOS, or WSL, the equivalent is:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
Once installed, restarting the terminal and running hermes setup launches a guided configuration flow that walks through model selection, terminal backend, messaging gateway, and tool setup in sequence.

Choosing and Routing Models

The first real decision in setup is which LLM provider powers the agent's "brain." Authentication happens via OAuth rather than raw API keys, which extends to being able to log in through an existing Claude Code or Codex CLI session rather than generating a separate API key.
What's genuinely well-designed here is how Hermes separates the model used for your main conversation from the models used for background and auxiliary tasks. By default, the same model handles both, but each auxiliary task can be pointed at a different provider independently.
The tasks that support this kind of override are:
This is configured directly in config.yaml, for example:
This kind of explicit routing solves a real problem with OpenRouter as a default choice: the same nominal model is often deployed by many different providers, frequently in different quantizations, and OpenRouter will silently shuffle each new request between roughly twenty of them.
The practical effect is that within a single session, you're not talking to one consistent model – you're talking to a rotating cast of differently-configured instances of it, some of which handle tool calls and prompt templates more reliably than others. Routing manually inside Hermes avoids this entirely.

It's also worth noting that if you want to save money on the conversational model without sacrificing coding quality, Hermes supports /claude_code and /codex commands that delegate coding tasks directly to those CLI tools rather than handling them with the configured chat model.

Terminal Backends

A core piece of the architecture is the Terminal Backend Environment, which determines where and how shell commands and Python scripts actually execute, and how the agent touches your filesystem. Hermes supports five.
Local is the default. Commands run directly on your machine with the same permissions as your user account – no isolation. It's the right choice for local development and trusted personal use where you want the agent editing your actual project files.
Safety here relies entirely on a built-in approvals system that intercepts destructive commands (an rm -rf /, a DROP TABLE) and asks for explicit permission before running them.
Docker runs the agent inside an isolated sandbox so it can't touch your host system. SSH has the agent execute commands and work with files on a remote server over a remote connection. Modal runs everything in serverless cloud sandboxes – you're essentially renting compute by the second, paying only for the actual seconds your code runs.
Daytona is a container-management layer purpose-built for AI coding agents; it's faster than running Docker directly and handles environment setup and dependency installation automatically.
For most personal use cases, Local is genuinely sufficient – the other options matter mainly if you're running untrusted code or operating at team scale.
Messaging Gateway and Tool Configuration

After the terminal backend, setup moves to choosing where you'll actually talk to the agent – Telegram being the most polished option. Selecting it gives you a direct link that spins up a pre-configured bot; there's no manual bot-token setup involved.



The remainder of setup walks through enabling individual tools and their respective providers – browser automation, image generation, text-to-speech, and web search. For web search specifically, self-hosted Firecrawl or Exa stand out as strong choices for agent-oriented scraping and retrieval.




X search requires a Grok subscription to enable, which is worth knowing before you go looking for it in the menu.

Slash commands worth knowing
Hermes ships with a long list of slash commands, most self-explanatory by name, but a handful are worth calling out specifically.
On the development side /github_pr_workflow handles the full branch-to-merge cycle including CI, /github_code_reviewreviews pull requests, and /codebase_inspection analyzes a repository's language breakdown and line counts. /dogfood is a dedicated QA mode that hunts for bugs in a web app and produces an evidence-backed report. /spike runs a quick, throwaway experiment to validate an idea before committing to full development, and /systematic_debugging works through bugs in four phases, understanding root cause before attempting a fix.
There's also a cluster of integration-specific commands – /notion, /obsidian, /airtable, /google_workspace, /arxiv, /blogwatcher, /polymarket, /ocr_and_documents, /youtube_content – each wrapping a specific external service or workflow, plus /bundles, which groups several existing skills under one slash command via small YAML configuration files.
Cron jobs and Webhooks
Two automation primitives deserve particular attention.
Context Engines
The context engine governs how Hermes compresses and manages conversation history once it approaches the model's token limit, and there are two options.

Memory Engines
External memory providers run alongside Hermes's built-in local memory files, MEMORY.md and USER.md, adding capabilities like semantic search and knowledge graphs.
Several can be configured directly through the setup TUI.
For day-to-day use, the default local memory is genuinely adequate for most people – the heavier systems trade real resource cost, especially RAM for locally hosted options, for capability that most workflows don't yet need.
The Self-Improving Loop
This is the feature that most distinguishes Hermes from a conventional agent: a set of asynchronous background processes that continuously analyze your conversations, extract useful patterns from them, and write those patterns into long-term memory and procedural memory (skills) – then maintain that accumulated knowledge so it doesn't decay over time. The whole system runs in parallel with your main chat and is built from three components: a trigger system, a background review agent, and a curator.
Hermes doesn't analyze every message in real time, since that would burn tokens for no benefit. Instead, it relies on two counters that trigger a reflection pass once they cross a threshold.
A memory trigger fires every ten user prompts, checking whether new facts worth saving have appeared in the conversation.
A skill trigger fires every ten tool-call iterations within a single turn, on the theory that if the agent just spent that many steps fighting through a problem by trial and error, that experience is worth analyzing and possibly turning into a reusable skill.
Once either counter hits its limit, an internal function fires, handing off a snapshot of the current conversation to a background review process.
This snapshot goes to a fully separate, isolated agent process that runs in parallel without interrupting your main session. It works in two directions.
For the curator to eventually judge which of these self-generated skills are actually worth keeping, Hermes maintains a hidden usage log tracking, for every skill: how many times it's been loaded into a prompt, how many times the agent has opened it to read it, how many times it's been edited, and timestamps for creation, last use, and last edit.
Left unchecked, this process can eventually produce hundreds of skills, some redundant, some outdated.
The curator exists to keep that knowledge base from degrading. It only starts when two conditions hold simultaneously: enough time has passed since its last run (seven days, by default), and the main agent has been idle long enough (two hours, by default) that a heavy maintenance pass won't interfere with active work.
Before making any changes, it automatically backs up the entire skills directory, so any unsatisfactory result can be rolled back through a single terminal command.
The curator's work happens in two phases:
For each skill, the curator decides to keep it as-is if it's still accurate and useful, fix it if it contains errors or outdated methods, merge it with another skill covering substantially the same ground (correctly relocating any associated scripts, evals, or reference files and rewriting relative paths in the process), or archive it outright.
At the end of the cycle, it produces a detailed report including a rename map showing exactly how old skill names mapped to new ones after any merges, so the reasoning behind every decision is fully auditable.
Using Hermes well
Cloud agents like this are genuinely valuable for any process you can run 24/7 – coding work being the notable exception – provided you've actually digitized that process carefully and built a solid skill around it, including evaluations.
The workflow that tends to produce good results looks something like this:
The broader point is that the range of things you can hand to an agent like this is limited mainly by how well you can specify the work, not by the agent's raw capability.
Three principles seem to hold up across use cases: don't outsource coding work to an unsupervised 24/7 cloud agent, keep a human in the loop reviewing what the agent actually produces, and treat skill refinement as ongoing work rather than something you finish once and walk away from.
If this was useful – bookmark it. You'll want to come back to it.
For more breakdowns like this follow @ScottyBeamIO
No fluff, just what actually works.

