Audience: Elmar / PKA team — adoption decision for the existing Luci/Lucienne/Larry stack which already runs on an OpenClaw-style architecture (Mission Control, scheduler, skills, file-first memory). Question: stay on OpenClaw, migrate to Hermes, or stack other coding agents inside the existing workers?
Date: 2026-05-04 · Slug: agents-comparison-2026-05
Executive Summary
- Two distinct categories. Persistent-memory orchestration agents (OpenClaw, Hermes Agent) live alongside coding-execution agents (Claude Code, Cursor, Cline, Aider, OpenHands, Devin, Goose, Codex CLI). They stack — they don't compete.
- OpenClaw is the de-facto reference architecture for personal AI agents in 2026: 4-layer (Gateway → Agent → Tools → Memory), file-first Markdown memory, ~345K GitHub stars, MIT, fastest-ever GitHub repo to 100K stars. CVE-2026-33579 forced a security reckoning Feb 2026; 135k exposed instances were found on the public internet.
- Hermes Agent (Nous Research, MIT, Feb 25 2026, 95-110K stars in 7-10 weeks) is the direct philosophical alternative: same multi-channel + persistent-memory pitch, plus a closed learning loop that auto-generates reusable skill files every ~15 tool calls. Reviewers report 40% task-time reduction after 30 days vs a fresh agent.
- For coding execution, the consensus winner is Claude Code (80.9% SWE-bench Verified, 46% developer favourite per the dev survey, ~4% of public GitHub commits). Cursor dominates IDE-resident workflow, OpenHands is the open-source autonomous-PR option, Goose (now under AAIF/Linux Foundation) is the most-trendsetting OSS coding agent, Devin is the polished-but-niche autonomous remote agent.
- Multi-agent is now table-stakes (Feb 2026): Grok Build (8 agents), Windsurf (5 parallel), Claude Code Agent Teams, Codex CLI Agents SDK, Devin parallel sessions all shipped within a 2-week window.
- For the PKA setup specifically: Luci is already 90% an OpenClaw deployment. Decision is incremental — backport Hermes' learning-loop pattern into the existing skills system, or plug Goose/Claude Code/Codex CLI into worker dev-loops where they help.
State of Play
Persistent-Memory Orchestration Layer
| Framework |
Stars |
License |
Architecture |
Memory Model |
Differentiator |
| OpenClaw |
~345K |
MIT |
4-layer (Gateway/Agent/Tools/Memory), local gateway + ReAct loop |
File-first Markdown + SQLite cache + optional Mem0 hybrid |
Reference architecture; Task Brain unified scheduler (Mar 2026); rich skill ecosystem |
| Hermes Agent |
95-110K (Apr 2026) |
MIT |
Terminal UI + gateway + 6-channel messaging |
3-layer: short-term context + persistent skill library + cross-session FTS5 search |
Self-improving learning loop auto-generates skills from experience |
| NemoClaw |
n/a (commercial) |
Proprietary |
OpenClaw architecture commercialised |
Hybrid SQL+vector |
"Enterprise GA of the deep-agent pattern" |
Both OpenClaw and Hermes are model-agnostic (Claude / GPT / Gemini / DeepSeek / local Ollama / OpenRouter / Nous Portal). Both reach Telegram, Discord, Slack, WhatsApp, Signal, Email. Differences:
- OpenClaw: broader ecosystem, more skills, larger community, but session-by-session memory is "lossy" without Mem0 plug-in. Default config ships with no learning loop. Security: 9 reported CVEs to date; 40+ fixes shipped in v2026.2.12.
- Hermes: smaller community (~2.9K subs r/hermesagent), no web UI by default, but the Reflective Phase every 15 tool calls writes a new skill file. Skill compounding is the real differentiator. Zero CVEs to date.
Coding-Execution Agents
| Agent |
Form Factor |
Pricing |
Stars |
SWE-bench Verified |
Notable |
| Claude Code (Anthropic) |
Terminal CLI + IDE |
$20–200/mo + API |
n/a (closed) |
80.9% (Opus 4.5/4.6) |
Agent Teams, MCP, computer-use Mar 2026, 4% of all public GitHub commits |
| Cursor 3 |
AI-native IDE (VS Code fork) |
$20+/mo, ARR $2B+ |
n/a |
n/a (model-dependent) |
Cloud agents on isolated VMs, parallel Agent Tabs, /worktree, 30% of own PRs agent-made |
| Cline |
VS Code extension |
Free + BYOK |
30K+ |
model-dependent |
Plan/Code mode split, multi-model |
| Continue |
VS Code/JetBrains plugin |
Free + BYOK |
21K |
n/a |
Mature, plugin-style |
| Roo Code (formerly Roo Cline) |
VS Code plugin |
Free + BYOK |
n/a |
n/a |
Custom personas, privacy-first |
| Aider |
Terminal, git-native |
Free + BYOK |
39K, 4.1M installs |
n/a |
15B tokens/wk processed; every edit = git commit |
| OpenHands (formerly OpenDevin) |
Cloud sandbox + CLI/SDK |
OSS + cloud |
70K+ |
77% (Sonnet 4.5) |
$18.8M Series A; V0→V1 split Nov 2025; OpenHands Index benchmark |
| Devin (Cognition) |
Async remote agent |
$20/mo + $2.25/ACU |
n/a |
51.5% reported |
Plan-first, Devin Review Jan 2026, 67% PR merge rate |
| Goose (Block→AAIF) |
Desktop + CLI + API |
Free OSS, Apache 2.0 |
27-29K |
n/a |
Recipes (YAML workflows), 3000+ MCP servers, 60% of Block's 12k staff use weekly |
| Codex CLI (OpenAI) |
Terminal |
OpenAI sub |
n/a |
77.3% Terminal-Bench 2.0, 56.8% SWE-bench Pro |
GPT-5.4-Codex leads SWE-bench Pro |
| Gemini CLI (Google) |
Terminal |
Free + BYOK |
n/a |
n/a |
2M-token context, sub-agent delegation |
| OpenCode |
Terminal-native |
Free + BYOK |
95K+ (Apr 2026) |
n/a |
153K stars per one source, 75+ LLM providers, plan-first |
| Windsurf (formerly Codeium) |
AI-native IDE |
$15+/mo |
n/a |
n/a |
Cascade agent, #1 LogRocket Feb 2026 |
| Jules (Google) |
Proactive cloud agent |
Google Cloud |
n/a |
n/a |
Scans repos, auto-proposes work, 140K improvements |
| GitHub Copilot |
IDE integrations |
$10/mo Pro |
15M users |
n/a |
Universal default, opened to Claude+Codex Feb 2026 |
[unverified] OpenCode star count varies between 95K and 153K depending on source.
Benchmark Honest Caveats
- SWE-bench Verified saturated. Top models cluster at 77–81%. SWE-bench Pro (2K+ unseen problems, 2026) is the cleaner signal — Codex 56.8%, Opus 4.6 55.4%.
- Terminal-Bench 2.0 is the better signal for autonomous overnight work. "Claude Mythos" harness scores 92.1%; Codex CLI 77.3%. The 15-pt gap is meaningful.
- Scaffolding > model. A Feb 2026 test ran the same model through three different agent harnesses; results spread by 17 issues on 731 problems. The orchestrator does real work.
- Real-world PR merge rate is what matters. Devin reports 67% on defined tasks. Internal Anthropic uses Claude Code for 70-80% of technical employees daily.
Motivations / Why Each Tool Exists
- OpenClaw — solve the "agent forgets everything between sessions" problem with cheap, transparent, file-first memory; create a hackable reference for everyone else to build on. Fastest GitHub repo to 100K stars validated the demand.
- Hermes Agent — same problem, but argue that memory alone isn't enough — agents need to compound experience into reusable skills. The Reflective Phase is the architectural bet.
- Claude Code — narrow + deep: the best autonomous coding agent the underlying model can support, with as little orchestration tax as possible.
- Cursor / Windsurf — supervised IDE workflow for engineers who want AI as a pair, not as a delegate.
- Cline / Continue / Roo Code — "Cursor without lock-in": OSS extension equivalents, BYOK, model-agnostic.
- OpenHands / Devin / Jules — autonomous cloud worker: dispatch a ticket, get a PR back. Async, set-and-come-back.
- Goose — open the agent stack: any LLM, any platform, recipe-driven workflows, MCP-native, no proprietary harness.
- Aider — terminal git-native pair programmer. Every edit a commit. Boring on purpose.
- Codex CLI / Gemini CLI — OpenAI/Google's CLI front for their own model families. Purpose-built for vendor's strengths.
Beneficiaries / Adoption Signals
| Group |
What they pick |
| Enterprise dev teams |
Claude Code + Cursor + Copilot stack; Goose for open-source teams |
| Solo OSS devs |
OpenCode / Aider / Cline + BYOK provider |
| Agent platform builders |
OpenClaw (incumbent) or Hermes (compounding learning) |
| Background-task / async workflow runners |
OpenHands, Devin, Jules |
| Voice/multi-channel automation |
Hermes Agent, OpenClaw |
| Privacy-first / local-only |
Goose + Ollama, Aider + local model |
| Block (the company) |
60% of 12,000 employees use Goose weekly |
| Anthropic (the company) |
70-80% of technical employees use Claude Code daily |
Scenarios / Probabilities (12-month outlook)
- (60%) Two-layer stack stabilises. Persistent-memory orchestration (OpenClaw / Hermes / NemoClaw) below, coding-execution agents (Claude Code, Cursor, Goose, OpenHands) above. Most teams run 2–3 tools.
- (25%) Hermes' learning-loop pattern gets backported into OpenClaw. OpenClaw absorbs the Reflective Phase concept, OpenClaw retains the lead via ecosystem advantage. Hermes stays niche but influential.
- (10%) Claude Code subsumes the orchestration layer. Anthropic ships native multi-channel + cross-session learning, eating the bottom layer. Less likely because Anthropic prefers the "narrow + deep" position.
- (5%) A dark-horse (Goose/AAIF, OpenHands V2, Cursor's cloud agents) becomes the convergence point. Possible if Linux Foundation governance becomes the default.
Second-Order Effects
- CVE-2026-33579 (OpenClaw) becomes the security playbook for personal agents. WASM sandboxing for skills (OpenClaw Q2 2026 roadmap) likely becomes industry standard.
- MCP wins as the integration substrate. Goose, Claude Code, Cursor, OpenHands, Hermes, OpenClaw all converge on MCP. 3000+ MCP servers exist as of Apr 2026.
- Recipes / Skills become a portable asset. Goose Recipes (YAML), Claude Code skills, Hermes skill files, OpenClaw skills — convergence likely. Cross-agent skill markets emerging.
- Multi-agent coordination eats the next 2 quarters of attention. Whoever ships the cleanest task-graph + cross-agent state primitive wins. OpenClaw's Task Brain (Mar 2026) is the early reference.
- Pricing pressure on closed agents. When Hermes runs on a $5 VPS at $30-65/mo total, $200/mo Claude Code starts looking expensive for the same persistent-memory use case (though not the same coding depth).
Contested / Unverified
- Hermes 30-day skill compounding — "40% task-time reduction" cited only by TokenMix.ai. [unverified] — independent reproduction would help.
- OpenCode star count — 95K vs 153K depending on source. [unverified]
- Devin SWE-bench scores — 13.86% at launch, 51.5% in 2026 reviews. Cognition's official numbers and third-party numbers diverge. [unverified scope]
- Hermes 0 CVEs vs OpenClaw 9 — claimed by TokenMix; needs cross-reference with NVD/MITRE.
- OpenClaw "fastest repo to 100K stars in 2 days" — claimed but star history is gameable; treat as marketing-ish.
- Real-world PR merge rates for OpenHands and Devin diverge widely between vendor case studies (12x ROI, 80+ PRs/wk) and user reports (15-30% success on first attempt).
Specific PKA Fit Assessment
Luci's stack already implements the OpenClaw pattern: Mission Control (gateway + ticket board), scheduler (cron), skills (~/.claude/skills/), file-first memory (vault.db, MEMORY.md), Telegram channel, multi-provider (Anthropic / GLM / Kimi / MiniMax). The work to migrate is small.
| Capability |
Luci has it? |
Hermes adds? |
Other coding agent adds? |
| Persistent memory across sessions |
✅ MEMORY.md + vault.db |
Marginal |
n/a |
| Multi-channel (Telegram, Discord, …) |
Partial (Telegram only) |
✅ +5 channels |
n/a |
| File-first skills |
✅ 90+ skills |
Equivalent |
n/a |
| Self-improving learning loop |
❌ |
✅ Reflective Phase every 15 calls |
n/a |
| Cron / scheduler |
✅ scheduler.py |
Equivalent |
n/a |
| Multi-provider model switching |
✅ provider-switch skill |
Equivalent |
n/a |
| Subagent dispatch |
✅ Agent tool, tessa, scott |
Equivalent |
n/a |
| Coding-depth on hard problems |
❌ depends on Anthropic |
❌ |
✅ Claude Code |
| Autonomous PR generation |
Partial (Larry) |
❌ |
✅ OpenHands, Devin |
| Recipe-style portable workflows |
Partial (skills) |
❌ |
✅ Goose Recipes |
Biggest concrete gap: the learning loop. Worth backporting Hermes' Reflective Phase (compound skill is partway there) into Luci's existing skills system.
YouTube + Social Source Set (curated)
To be added to the NotebookLM notebook in Phase 2:
YouTube — channel : URL pattern : focus
- Theo Browne (t3.gg) on Claude Code, Cursor, agent stack opinions
- Fireship on agent tool roundups
- Indy Dev Dan on Claude Code workflows
- Matthew Berman on agent benchmarking
- AI Jason on Cursor / Cline / OpenHands
- Greg Isenberg on agent landscape commentary
- Cole Medin on multi-agent setups
- David (Hermes Agent walkthrough): https://www.youtube.com/watch?v=4Sln_6K2z8c
- "Hermes agent just hit 57,000 GitHub stars" short: https://www.youtube.com/shorts/ns_X7wsm_HQ
Reddit threads / subs
- r/ClaudeAI, r/ClaudeCode — Claude Code rankings (226 community mentions)
- r/cursor — Cursor billing, parallel agents
- r/LocalLLaMA — Hermes / OpenClaw / Goose threads
- r/programming, r/ChatGPTCoding — broad rankings
- r/hermesagent (~2.9K subs)
X / Twitter named devs
- @swyx (latent.space) — agent landscape commentary
- @simonw — daily-driver hands-on
- @leerob (Vercel) — Cursor/Claude Code adoption
- @anthropic / @AnthropicAI — Claude Code releases
- @openai — Codex CLI
- @teknium (Nous Research) — Hermes founder
Tech blogs / publications
- The New Stack — "OpenClaw vs Hermes Agent: race to build AI assistants that never forget"
- every.to — agent commentary
- latent.space — agent commentary
- simonw blog — daily hands-on
- Anthropic Eng Blog — Claude Code release notes
Full URL Source List
OpenClaw
- https://abvijaykumar.medium.com/openclaw-a-deep-agent-realization-14125bbd5bad
- https://petronellatech.com/blog/openclaw-ai-agent-guide-2026
- https://robotpaper.ai/reference-architecture-openclaw-early-feb-2026-edition-opus-4-6/
- https://agenticera.fyi/2026/04/30/openclaw-the-open-source-agent-that-quietly-changed-everything/
- https://skywork.ai/skypage/en/openclaw-ai-memory/2048600764352425984
- https://github.com/rohitg00/awesome-openclaw
- https://bibek-poudel.medium.com/how-openclaw-works-understanding-ai-agents-through-a-real-architecture-5d59cc7a4764
Hermes Agent
- https://github.com/nousresearch/hermes-agent
- https://hermes-agent.nousresearch.com/
- https://hermes-agent.nousresearch.com/docs/user-stories
- https://github.com/0xNyk/awesome-hermes-agent
- https://github.com/NousResearch/hermes-agent-self-evolution
- https://tokenmix.ai/blog/hermes-agent-review-self-improving-open-source-2026
- https://hermes-agent.ai/blog/hermes-agent-review-2026
- https://medium.com/@sathishkraju/i-switched-from-openclaw-to-hermes-agent-heres-what-nobody-told-me-5f33a746b6ca
- https://www.youtube.com/watch?v=4Sln_6K2z8c
- https://www.youtube.com/shorts/ns_X7wsm_HQ
Hermes vs OpenClaw vs Claude Code (head-to-head)
- https://thenewstack.io/persistent-ai-agents-compared/
- https://utilo.io/en/home/blog/hermes-vs-claude-code-vs-openclaw-2026
- https://productivetechtalk.com/2026/04/09/ai-agents-2026-hermes-vs-openclo-vs-claude-code-exposed/
- https://www.bluehost.com/blog/hermes-agent-vs-claude-code/
- https://myclaw.ai/blog/hermes-agent-vs-claude-code
- https://www.browseract.com/blog/hermes-agent-vs-claude-code-cursor
- https://www.syntaxdispatch.com/blog/hermes-agent-vs-claude-code
- https://hermesatlas.com/guide/vs-claude-code/
Claude Code / Coding-agent rankings
- https://artificialanalysis.ai/agents/coding
- https://www.morphllm.com/ai-coding-agent
- https://frontman.sh/blog/best-open-source-ai-coding-tools-2026/
- https://www.taskade.com/blog/claude-code-alternatives
- https://www.faros.ai/blog/best-ai-coding-agents-2026
- https://www.opensourcealternatives.to/blog/best-open-source-ai-coding-assistants
- https://thoughts.jock.pl/p/ai-coding-harness-agents-2026
- https://mightybot.ai/blog/coding-ai-agents-for-accelerating-engineering-workflows/
- https://dev.to/soulentheo/every-ai-coding-cli-in-2026-the-complete-map-30-tools-compared-4gob
- https://www.digitalocean.com/resources/articles/claude-code-alternatives
OpenHands
- https://github.com/OpenHands/OpenHands
- https://openhands.dev/
- https://openhands.dev/blog/openhands-index
- https://vibecoding.app/blog/openhands-review
- https://localaimaster.com/blog/openhands-vs-swe-agent
- https://arxiv.org/abs/2407.16741
Devin (Cognition)
- https://devin.ai/pricing
- https://www.openaitoolshub.org/en/blog/devin-ai-review
- https://www.eesel.ai/blog/cognition-ai
- https://aitoolranked.com/blog/devin-ai-review
- https://www.digitalapplied.com/blog/devin-ai-autonomous-coding-complete-guide
- https://aiagentsquare.com/agents/devin.html
Goose (Block / AAIF)
- https://goose-docs.ai/
- https://github.com/aaif-goose/goose
- https://block.xyz/inside/block-open-source-introduces-codename-goose
- https://effloow.com/articles/goose-open-source-ai-agent-review-2026
- https://aitoolanalysis.com/goose-ai-review/
- https://www.openaitoolshub.org/en/blog/goose-ai-agent-block-review
- https://vibecodinghub.org/tools/goose
- https://www.kdnuggets.com/free-agentic-coding-with-goose
- https://www.aviator.co/podcast/block--ai-agents-goose
Reddit / community pulse
- https://www.aitooldiscovery.com/guides/best-ai-for-coding-reddit
- https://www.aitooldiscovery.com/guides/best-ai-agents-reddit
- https://claw.mobile/blog/best-ai-coding-tool-reddit-2026
- https://beginnersinai.org/best-ai-tools-reddit-2026/
- https://www.nxcode.io/resources/news/best-ai-for-coding-2026-complete-ranking
Methodology Notes
- WebSearch (anthropic) for Hermes / OpenClaw / Goose / OpenHands / Devin / Reddit pulse: 4 queries returned ~40 unique credible URLs.
- Gemini CLI deep research (background, ~5 min) added comparative-matrix coverage.
- Codex CLI deep research (background) hit a sandbox issue but contributed source coverage on
awesome-openclaw + Hermes ecosystem links.
- Cross-verification policy: any number cited above appears in ≥ 2 sources or is flagged
[unverified].
Next Steps
- Create NotebookLM notebook with this seed + the curated URL set.
- Run NotebookLM Deep Research with a "deep web" query asking for:
- Named-analyst commentary not yet in seed
- Independent benchmark reproductions (especially Hermes 40% claim, Devin SWE-bench)
- Migration case studies (OpenClaw → Hermes, Claude Code → Goose)
- Security posture comparison (CVE histories)
- Cost-modelling for a Luci-equivalent deployment on each
- Gap-analyse, iterate once if needed.
- Generate audio overview (richly framed for PKA decision), slide deck (visual matrix), and briefing report (CEO-style prescription).