Monday morning. You open a new session with your AI agent and ask it to pick up where you left off on Friday. It has no idea what you're talking about. You explain the project again. The codebase structure. The naming conventions. The specific deployment quirk that took three hours to figure out last week. The agent listens patiently, processes everything — and then gives you the same wrong suggestion it gave on Wednesday.
This is not a bug. It's how almost every AI agent on the market is designed. Session ends, slate clears, you start from zero. Every. Single. Time.
Hermes Agent is built around one specific objection to that design: what if the agent actually kept what it learned?
What Hermes Agent Actually Is
Hermes Agent is an open-source AI agent built by Nous Research — the same lab behind the Hermes, Nomos, and Psyche model families. It launched in February 2026 under the MIT license. As of late April 2026, it has crossed 87,000 GitHub stars, which is notable for a project barely two months old.
It is not a chatbot. Not an IDE plugin. Not a wrapper around a single model API.
The clearest description: a persistent, self-hosted AI agent that lives on your server, maintains memory across sessions, builds reusable skills from its own completed tasks, and connects to you through whatever messaging platform you already use — Telegram, Discord, Slack, WhatsApp, Signal, or email. You give it a task, it executes using 40+ built-in tools, and after it succeeds, it can write down exactly what it did so it doesn't have to figure it out again next time.
That last part is the architecture that makes it different from the field.
The Problem It's Solving
Every AI agent you've used — Claude Code, Codex, any OpenClaw setup — has the same fundamental design: stateless by default. The session ends and the agent's working memory resets. Whatever it figured out about your project, your preferences, the specific edge case it navigated last Tuesday — gone.
Workarounds exist. You can paste context into every new session. You can maintain a giant CONTEXT.md that you feed in manually. Some tools now have lightweight "memory" features that store a few notes. These help. They don't solve it.
The deeper problem is not just remembering facts. It's that the agent doesn't retain what worked. You teach it a non-obvious approach to your deployment pipeline on Monday. Wednesday rolls around and it tries a different approach from scratch, hits the same dead end it avoided before, and you have to walk it through the fix again. No accumulation of procedural knowledge. No skill transfer from one session to the next.
Hermes is built specifically around that gap. After a complex task completes — defined as five or more tool calls, a recovery from an error, or a non-obvious workflow — it evaluates whether the approach is worth capturing. If it is, it writes a skill file. Not a log. A reusable instruction set.
How the Learning Loop Works
The learning loop is the architecture that separates Hermes from everything else. It has four stages, and they run in sequence after every non-trivial task.
Execute. The agent receives a task, decomposes it, selects tools, and runs. This part looks identical to any other agent framework — plan, act, observe, repeat until done.
Evaluate. After execution, the agent checks the outcome. Did it succeed? Did the user accept the result without edits? Did the user correct something? Both explicit corrections (you change its output) and implicit acceptance (you move forward without changing anything) feed into this evaluation. The agent builds a signal from your behavior, not just from its own self-assessment.
Extract. When the evaluation passes a threshold — task succeeded, approach was non-obvious, path involved recoveries or multiple steps — the agent abstracts the reasoning pattern into a skill document. The skill captures: what the task type looks like, what approach worked, what pitfalls to avoid, what tools to reach for first. Stored as a markdown file in ~/.hermes/skills/.
Refine. Skills aren't static. When Hermes encounters similar tasks in future sessions, it compares the new outcome to the existing skill. If a better approach consistently outperforms the stored one, the skill updates. If your preferences shift over time, the skill adapts. The agent's skill library is a living document, not a snapshot from day one.
The result is an agent that gets faster and more accurate specifically on task types it has seen before. It doesn't improve uniformly on everything — it compounds on your workflows.
The Three-Layer Memory System
Memory in Hermes is not a single feature. It's a three-layer architecture where each layer has a specific job, a specific location on disk, and a specific moment when it activates. Understanding the layers separately matters for knowing what the agent actually knows at any point.
Layer 1 — Prompt Memory. Two curated files: MEMORY.md and USER.md, both stored in ~/.hermes/memories/. These load into the system prompt at the start of every session — before you send a single message. MEMORY.md holds environmental facts: your project structure, server configuration, naming conventions, lessons the agent learned the hard way. USER.md holds facts about you: your preferred communication style, your technical level, standing preferences, recurring decisions. The total character budget across both files is capped at 3,575 characters. That constraint is intentional — it forces curation over accumulation. The agent manages these files through an internal memory tool with three operations: add, replace, or remove. One important detail: edits made during a session take effect only in the next session, not mid-conversation.
Layer 2 — Episodic Archive. Every session is stored in a local SQLite database with FTS5 full-text search and LLM-powered summarization. When the agent needs to recall something from a conversation three weeks ago, it searches this archive on demand. It doesn't stuff your entire history into the context window — it retrieves only what's relevant to the current task. Standard RAG pulls disconnected snippets. This maintains coherent context: the agent can reconstruct what was decided, what was tried, and what was ruled out, across months of sessions.
Layer 3 — Skill Memory. The skill files described in the previous section. These aren't loaded by default — the system prompt includes only skill names and short summaries. When the agent determines a skill is relevant to the current task, it loads the full content. An agent with 200 skills pays roughly the same context cost as one with 40, because detailed skill content only enters the context when it's actually needed. Progressive disclosure, not bulk injection.
Three layers. Each one designed for a different retrieval pattern. The agent doesn't need to choose between them — it uses all three transparently.
Skill Creation and Storage
Skills in Hermes are markdown files. Human-readable, auditable, modifiable. You can open any skill file, read exactly what the agent wrote, edit it, or delete it. No black box. No opaque vector database that you can't inspect.
Each skill document captures: the task type it applies to, the approach that worked, the tools it relied on, pitfalls to avoid, and any edge cases the agent recovered from. Think of it as a concise runbook the agent wrote for itself — structured enough to follow reliably, short enough to fit in context efficiently.
Hermes ships with 40+ pre-built tools out of the box. The community skill library at agentskills.io follows an open standard, which means skills are portable and shareable across agents that support the format. You can install a community skill with a single command: hermes skills install [skill-name]. The agent can also generate skills on its own without being asked — when a task meets the creation threshold, the file appears in ~/.hermes/skills/ automatically.
Where It Runs and How to Talk to It
Hermes supports six deployment backends: local, Docker (with hardened read-only root), SSH, Daytona, Singularity, and Modal serverless. The Modal option is worth noting separately — it means near-zero idle cost. The agent's environment hibernates when it's not doing anything and wakes on demand. A VPS that runs 24/7 costs money even when idle. Modal doesn't.
On the platform side, Hermes connects to 15+ messaging platforms from a single gateway process: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, and several others including DingTalk and Home Assistant. Start a task on your laptop terminal in the morning, check the status from Telegram on your phone at lunch, get the completion notification in Slack. The session is the same. The agent doesn't fragment across interfaces.
For models, it supports 200+ through OpenRouter, plus direct integrations with Nous Portal, OpenAI Codex, Anthropic Claude, Kimi/Moonshot, MiniMax, and custom endpoints. Switch the model with hermes model. No code changes, no lock-in. The same agent, different reasoning engine.
It also ships with a built-in cron scheduler. Natural language scheduling — "every morning at 9am, check Hacker News for AI news and send me a summary on Telegram" — gets turned into a cron job that runs unattended. Daily reports, nightly backups, weekly audits. None of this requires writing a single line of code.
Hermes vs OpenClaw: The Real Difference
OpenClaw is the dominant open-source agent framework right now — 345,000+ GitHub stars as of April 2026, an ecosystem of 5,700+ community skills, and a large developer community. Comparing the two is reasonable because they overlap on several features: both are open-source, self-hosted, have messaging integrations, memory, browser automation, scheduled tasks, and multi-agent workflows.
The cleanest way to describe the difference: Hermes packages a gateway around a learning agent. OpenClaw packages an agent around a messaging gateway.
OpenClaw is optimized for breadth of integration and team operations. Its architecture centers on routing, sessions, channel management, and a massive ecosystem of pre-built skills. If you need multi-channel orchestration, a large plugin marketplace, or team-level agent coordination, OpenClaw has the stronger story today.
Hermes is optimized for depth over time. Skills are not downloaded from a marketplace — they're generated from the agent's own completed work. Memory is not a feature bolted onto a session system — it's a three-layer architecture designed from the ground up for persistence. The agent compounds on your specific workflows, not on a generic set of pre-packaged capabilities.
OpenClaw has had a difficult 2026 on the security front. Multiple CVEs — including one (CVE-2026-25253, CVSS 8.8) involving token exfiltration through unsafe WebSocket behavior — and a supply chain attack on its ClawHub skill marketplace that surfaced 341 malicious skill entries in March. Hermes takes a conservative-by-default security posture: read-only root filesystems, namespace isolation, built-in prompt injection scanning, and no major public incidents as of April 2026.
A growing segment of experienced users has stopped treating this as either/or. The pattern that appears in Reddit threads: OpenClaw for multi-channel orchestration and planning, Hermes for focused execution tasks that benefit from a learning loop. Two tools with different strengths.
| Feature | Hermes Agent | OpenClaw |
|---|---|---|
| Primary focus | Self-improving agent runtime | Gateway-first assistant platform |
| Skill generation | Autonomous (agent creates from work) | Marketplace (ClawHub, 5,700+ skills) |
| Memory architecture | 3-layer (prompt + episodic + skill) | Markdown files + workspace memory |
| GitHub stars (Apr 2026) | ~87,500 | 345,000+ |
| Security posture | Conservative by default, no major CVEs | Multiple CVEs in 2026, supply chain incident |
| Deployment backends | 6 (local, Docker, SSH, Daytona, Singularity, Modal) | Local, Docker primarily |
| Model support | 200+ via OpenRouter + direct integrations | Multiple, strong Claude integration |
How to Get It Running
The installer handles everything in one command. Python, Node.js, the repo clone, the virtual environment, the global hermes command setup — all of it. On Linux, macOS, WSL2, and Android via Termux. Native Windows is not supported; WSL2 is required.
Run this in your terminal:
After install, reload your shell — source ~/.bashrc or source ~/.zshrc — or you'll get a "command not found" error. First-timers skip this step more than you'd expect, and it wastes twenty minutes of troubleshooting.
Then run the setup wizard:
The wizard walks you through model provider selection, API key configuration, and optional messaging platform connection. For model providers, you have choices: Nous Portal (OAuth), OpenRouter (single API key, 200+ models), direct Anthropic or OpenAI API keys, or a custom endpoint if you're running a local model. The setup is interactive — it asks questions, you answer, it configures. No editing YAML by hand.
One requirement to know before you start: Hermes requires a model with at least 64,000 tokens of context. Models with smaller windows cannot maintain enough working memory for multi-step tool-calling and will be rejected at startup. Most cloud models meet this easily. Local models via Ollama or llama.cpp need to be configured with context size explicitly — something like --ctx-size 65536.
After that, start it with hermes for the classic CLI or hermes --tui for the terminal UI with overlays and mouse selection. Both share the same sessions, slash commands, and config.
To connect a messaging platform — Telegram is the most common starting point — run hermes gateway setup. By default, the gateway denies all users not on an allowlist. That is the right default for a bot with terminal access to your machine. You add allowed users per platform with a DM pairing system — unknown users get a one-time pairing code when they message the bot, you approve them with hermes pairing approve telegram [CODE]. Codes expire after an hour.
One security note worth taking seriously: Hermes is a highly autonomous agent with terminal access. If you want it to run overnight on long tasks without asking for permission at every step, you're giving it significant system access. The right approach — run it on a separate machine, a VPS, or inside a Docker container with namespace isolation. Not on your primary computer with your personal accounts accessible. The blast radius matters.
hermes config set terminal.backend docker adds a meaningful layer of isolation. Don't skip this step in production.
My Take
The honest version: Hermes Agent is technically impressive in ways that most coverage doesn't fully explain, and practically limited in ways that most coverage glosses over. Both things are true, and neither cancels the other.
The learning loop architecture is genuinely novel. Most agent frameworks treat memory as a feature addition — you bolt a memory module onto a stateless system and call it persistent. Hermes treats accumulation as the core design constraint. The three-layer separation — prompt memory, episodic archive, skill memory — each with a specific retrieval pattern and a specific token budget — reflects real engineering judgment about how context degrades at scale. An agent with 200 skills paying roughly the same context cost as one with 40 is not a marketing claim. It's a consequence of the progressive disclosure architecture. That part holds up under scrutiny.
What I'm less convinced about is self-evaluation reliability. The agent decides for itself whether a task outcome was good enough to warrant skill creation. That evaluation loop is only as reliable as the model doing the evaluating. Feedback from the community confirms this: manual edits to MEMORY.md or skill files sometimes get overwritten on the next update cycle. The agent's self-assessment of what to preserve doesn't always match what the user actually wants preserved. Not a dealbreaker, but not nothing either.
The comparison to OpenClaw also deserves more nuance than most takes give it. Hermes is not OpenClaw with better memory. It's a different architectural bet: depth over breadth. If your workflows are repetitive and structured — running the same class of tasks against the same codebase over months — Hermes compounds in ways OpenClaw simply doesn't. If your workflows are broad and one-off — constantly different task types, no recurring patterns — the learning loop has nothing to work with, and OpenClaw's ecosystem breadth wins. Know which category you're in before picking a framework.
- Hermes Agent is an open-source, self-hosted AI agent by Nous Research — launched February 2026, MIT licensed
- Its core differentiator: a closed learning loop that automatically creates reusable skill files from completed tasks
- Three-layer memory: prompt memory (MEMORY.md + USER.md), episodic archive (SQLite FTS5), and skill memory — each with a distinct retrieval pattern
- Supports 200+ models through OpenRouter; switch providers with one command, no code changes
- Connects to 15+ messaging platforms from a single gateway; sessions continue across interfaces
- Six deployment backends including Modal serverless — near-zero idle cost
- 64,000+ token context minimum required; most cloud models meet this, local models need explicit configuration
- Conservative security defaults by design; run in Docker or on a dedicated machine, not on your primary system
- Best suited for repetitive, structured workflows with recurring patterns — not one-off broad task coverage
FAQ
Is Hermes Agent free to use?
Yes. Hermes Agent is MIT licensed and free to install and run. You bring your own model API keys — which have their own costs depending on the provider — and optionally a VPS to host it. The agent software itself has no license fees, subscriptions, or usage limits imposed by Nous Research.
Does Hermes Agent work on Windows?
Native Windows is not supported. You need to install WSL2 (Windows Subsystem for Linux) and run Hermes from inside the WSL2 environment. The installer works normally once you're inside WSL2. The official docs have a dedicated WSL2 setup path.
Can I use Claude or GPT-4 as the model inside Hermes?
Yes. Hermes supports direct API integrations with Anthropic Claude, OpenAI GPT models, and 200+ others through OpenRouter. You configure the provider during initial setup with hermes model, and you can switch at any time without code changes. The only requirement is that the model supports at least 64,000 tokens of context.
How is Hermes Agent different from Claude Code or Codex?
Claude Code and Codex are session-based coding assistants — they work within an IDE or terminal session, and their context resets when the session ends. Hermes is a persistent agent that lives on your server, maintains memory across sessions indefinitely, builds skills from its own work, and reaches you through messaging platforms. It's designed for long-horizon, recurring workflows, not single-session coding tasks. The two are complementary — Hermes can actually call Claude Code or Codex as sub-agents and coordinate between them.
What happens to my data when Hermes runs?
Hermes has zero telemetry. Nothing is sent to Nous Research servers. All data — session history, MEMORY.md, USER.md, skill files — stays on your machine or your VPS. The only external calls are to whatever model provider you configure, and those are subject to that provider's data policies. No data leaves your infrastructure unless you explicitly set up an external service.
How much does it cost to run Hermes on a VPS?
The agent software is free. A basic VPS with 2 vCPU cores and 8GB RAM — sufficient for one Hermes instance with room to grow — runs around $8-10/month on most hosting providers. Model API costs vary by provider and usage volume. The Modal serverless backend option reduces infrastructure cost further: the environment hibernates when idle and you only pay for active compute time.
External sources: Hermes Agent Official Documentation · GitHub Repository (NousResearch) · The New Stack — OpenClaw vs Hermes Analysis
The most direct next step: run the one-line installer on a spare machine or a $10 VPS, connect it to Telegram, and give it one recurring workflow you handle manually right now. Something you do at least twice a week. Run it for ten days. The difference between day one and day ten is where the architecture either proves itself or doesn't — no amount of reading about learning loops substitutes for watching the skill library grow. Related reading on agentic AI: GPT-5.5 vs Claude Opus 4.7 Benchmark Breakdown and How Does GPT-5.5 Work: The Agentic Shift Explained.
0 Comments