AI Coding & Orchestration Agents 2026

Audience: Elmar / PKA team — adoption decision for the existing Luci/Lucienne/Larry stack which already runs on an OpenClaw-style architecture (Mission Control, scheduler, skills, file-first memory). Question: stay on OpenClaw, migrate to Hermes, or stack other coding agents inside the existing workers?

Date: 2026-05-04 · Slug: agents-comparison-2026-05

Executive Summary

Two distinct categories. Persistent-memory orchestration agents (OpenClaw, Hermes Agent) live alongside coding-execution agents (Claude Code, Cursor, Cline, Aider, OpenHands, Devin, Goose, Codex CLI). They stack — they don't compete.
OpenClaw is the de-facto reference architecture for personal AI agents in 2026: 4-layer (Gateway → Agent → Tools → Memory), file-first Markdown memory, ~345K GitHub stars, MIT, fastest-ever GitHub repo to 100K stars. CVE-2026-33579 forced a security reckoning Feb 2026; 135k exposed instances were found on the public internet.
Hermes Agent (Nous Research, MIT, Feb 25 2026, 95-110K stars in 7-10 weeks) is the direct philosophical alternative: same multi-channel + persistent-memory pitch, plus a closed learning loop that auto-generates reusable skill files every ~15 tool calls. Reviewers report 40% task-time reduction after 30 days vs a fresh agent.
For coding execution, the consensus winner is Claude Code (80.9% SWE-bench Verified, 46% developer favourite per the dev survey, ~4% of public GitHub commits). Cursor dominates IDE-resident workflow, OpenHands is the open-source autonomous-PR option, Goose (now under AAIF/Linux Foundation) is the most-trendsetting OSS coding agent, Devin is the polished-but-niche autonomous remote agent.
Multi-agent is now table-stakes (Feb 2026): Grok Build (8 agents), Windsurf (5 parallel), Claude Code Agent Teams, Codex CLI Agents SDK, Devin parallel sessions all shipped within a 2-week window.
For the PKA setup specifically: Luci is already 90% an OpenClaw deployment. Decision is incremental — backport Hermes' learning-loop pattern into the existing skills system, or plug Goose/Claude Code/Codex CLI into worker dev-loops where they help.

State of Play

Persistent-Memory Orchestration Layer

Framework	Stars	License	Architecture	Memory Model	Differentiator
OpenClaw	~345K	MIT	4-layer (Gateway/Agent/Tools/Memory), local gateway + ReAct loop	File-first Markdown + SQLite cache + optional Mem0 hybrid	Reference architecture; Task Brain unified scheduler (Mar 2026); rich skill ecosystem
Hermes Agent	95-110K (Apr 2026)	MIT	Terminal UI + gateway + 6-channel messaging	3-layer: short-term context + persistent skill library + cross-session FTS5 search	Self-improving learning loop auto-generates skills from experience
NemoClaw	n/a (commercial)	Proprietary	OpenClaw architecture commercialised	Hybrid SQL+vector	"Enterprise GA of the deep-agent pattern"

Both OpenClaw and Hermes are model-agnostic (Claude / GPT / Gemini / DeepSeek / local Ollama / OpenRouter / Nous Portal). Both reach Telegram, Discord, Slack, WhatsApp, Signal, Email. Differences:

OpenClaw: broader ecosystem, more skills, larger community, but session-by-session memory is "lossy" without Mem0 plug-in. Default config ships with no learning loop. Security: 9 reported CVEs to date; 40+ fixes shipped in v2026.2.12.
Hermes: smaller community (~2.9K subs r/hermesagent), no web UI by default, but the Reflective Phase every 15 tool calls writes a new skill file. Skill compounding is the real differentiator. Zero CVEs to date.

Coding-Execution Agents

Agent	Form Factor	Pricing	Stars	SWE-bench Verified	Notable
Claude Code (Anthropic)	Terminal CLI + IDE	$20–200/mo + API	n/a (closed)	80.9% (Opus 4.5/4.6)	Agent Teams, MCP, computer-use Mar 2026, 4% of all public GitHub commits
Cursor 3	AI-native IDE (VS Code fork)	$20+/mo, ARR $2B+	n/a	n/a (model-dependent)	Cloud agents on isolated VMs, parallel Agent Tabs, /worktree, 30% of own PRs agent-made
Cline	VS Code extension	Free + BYOK	30K+	model-dependent	Plan/Code mode split, multi-model
Continue	VS Code/JetBrains plugin	Free + BYOK	21K	n/a	Mature, plugin-style
Roo Code (formerly Roo Cline)	VS Code plugin	Free + BYOK	n/a	n/a	Custom personas, privacy-first
Aider	Terminal, git-native	Free + BYOK	39K, 4.1M installs	n/a	15B tokens/wk processed; every edit = git commit
OpenHands (formerly OpenDevin)	Cloud sandbox + CLI/SDK	OSS + cloud	70K+	77% (Sonnet 4.5)	$18.8M Series A; V0→V1 split Nov 2025; OpenHands Index benchmark
Devin (Cognition)	Async remote agent	$20/mo + $2.25/ACU	n/a	51.5% reported	Plan-first, Devin Review Jan 2026, 67% PR merge rate
Goose (Block→AAIF)	Desktop + CLI + API	Free OSS, Apache 2.0	27-29K	n/a	Recipes (YAML workflows), 3000+ MCP servers, 60% of Block's 12k staff use weekly
Codex CLI (OpenAI)	Terminal	OpenAI sub	n/a	77.3% Terminal-Bench 2.0, 56.8% SWE-bench Pro	GPT-5.4-Codex leads SWE-bench Pro
Gemini CLI (Google)	Terminal	Free + BYOK	n/a	n/a	2M-token context, sub-agent delegation
OpenCode	Terminal-native	Free + BYOK	95K+ (Apr 2026)	n/a	153K stars per one source, 75+ LLM providers, plan-first
Windsurf (formerly Codeium)	AI-native IDE	$15+/mo	n/a	n/a	Cascade agent, #1 LogRocket Feb 2026
Jules (Google)	Proactive cloud agent	Google Cloud	n/a	n/a	Scans repos, auto-proposes work, 140K improvements
GitHub Copilot	IDE integrations	$10/mo Pro	15M users	n/a	Universal default, opened to Claude+Codex Feb 2026

[unverified] OpenCode star count varies between 95K and 153K depending on source.

Benchmark Honest Caveats

SWE-bench Verified saturated. Top models cluster at 77–81%. SWE-bench Pro (2K+ unseen problems, 2026) is the cleaner signal — Codex 56.8%, Opus 4.6 55.4%.
Terminal-Bench 2.0 is the better signal for autonomous overnight work. "Claude Mythos" harness scores 92.1%; Codex CLI 77.3%. The 15-pt gap is meaningful.
Scaffolding > model. A Feb 2026 test ran the same model through three different agent harnesses; results spread by 17 issues on 731 problems. The orchestrator does real work.
Real-world PR merge rate is what matters. Devin reports 67% on defined tasks. Internal Anthropic uses Claude Code for 70-80% of technical employees daily.

Motivations / Why Each Tool Exists

OpenClaw — solve the "agent forgets everything between sessions" problem with cheap, transparent, file-first memory; create a hackable reference for everyone else to build on. Fastest GitHub repo to 100K stars validated the demand.
Hermes Agent — same problem, but argue that memory alone isn't enough — agents need to compound experience into reusable skills. The Reflective Phase is the architectural bet.
Claude Code — narrow + deep: the best autonomous coding agent the underlying model can support, with as little orchestration tax as possible.
Cursor / Windsurf — supervised IDE workflow for engineers who want AI as a pair, not as a delegate.
Cline / Continue / Roo Code — "Cursor without lock-in": OSS extension equivalents, BYOK, model-agnostic.
OpenHands / Devin / Jules — autonomous cloud worker: dispatch a ticket, get a PR back. Async, set-and-come-back.
Goose — open the agent stack: any LLM, any platform, recipe-driven workflows, MCP-native, no proprietary harness.
Aider — terminal git-native pair programmer. Every edit a commit. Boring on purpose.
Codex CLI / Gemini CLI — OpenAI/Google's CLI front for their own model families. Purpose-built for vendor's strengths.

Beneficiaries / Adoption Signals

Group	What they pick
Enterprise dev teams	Claude Code + Cursor + Copilot stack; Goose for open-source teams
Solo OSS devs	OpenCode / Aider / Cline + BYOK provider
Agent platform builders	OpenClaw (incumbent) or Hermes (compounding learning)
Background-task / async workflow runners	OpenHands, Devin, Jules
Voice/multi-channel automation	Hermes Agent, OpenClaw
Privacy-first / local-only	Goose + Ollama, Aider + local model
Block (the company)	60% of 12,000 employees use Goose weekly
Anthropic (the company)	70-80% of technical employees use Claude Code daily

Scenarios / Probabilities (12-month outlook)

(60%) Two-layer stack stabilises. Persistent-memory orchestration (OpenClaw / Hermes / NemoClaw) below, coding-execution agents (Claude Code, Cursor, Goose, OpenHands) above. Most teams run 2–3 tools.
(25%) Hermes' learning-loop pattern gets backported into OpenClaw. OpenClaw absorbs the Reflective Phase concept, OpenClaw retains the lead via ecosystem advantage. Hermes stays niche but influential.
(10%) Claude Code subsumes the orchestration layer. Anthropic ships native multi-channel + cross-session learning, eating the bottom layer. Less likely because Anthropic prefers the "narrow + deep" position.
(5%) A dark-horse (Goose/AAIF, OpenHands V2, Cursor's cloud agents) becomes the convergence point. Possible if Linux Foundation governance becomes the default.

Second-Order Effects

CVE-2026-33579 (OpenClaw) becomes the security playbook for personal agents. WASM sandboxing for skills (OpenClaw Q2 2026 roadmap) likely becomes industry standard.
MCP wins as the integration substrate. Goose, Claude Code, Cursor, OpenHands, Hermes, OpenClaw all converge on MCP. 3000+ MCP servers exist as of Apr 2026.
Recipes / Skills become a portable asset. Goose Recipes (YAML), Claude Code skills, Hermes skill files, OpenClaw skills — convergence likely. Cross-agent skill markets emerging.
Multi-agent coordination eats the next 2 quarters of attention. Whoever ships the cleanest task-graph + cross-agent state primitive wins. OpenClaw's Task Brain (Mar 2026) is the early reference.
Pricing pressure on closed agents. When Hermes runs on a $5 VPS at $30-65/mo total, $200/mo Claude Code starts looking expensive for the same persistent-memory use case (though not the same coding depth).

Contested / Unverified

Hermes 30-day skill compounding — "40% task-time reduction" cited only by TokenMix.ai. [unverified] — independent reproduction would help.
OpenCode star count — 95K vs 153K depending on source. [unverified]
Devin SWE-bench scores — 13.86% at launch, 51.5% in 2026 reviews. Cognition's official numbers and third-party numbers diverge. [unverified scope]
Hermes 0 CVEs vs OpenClaw 9 — claimed by TokenMix; needs cross-reference with NVD/MITRE.
OpenClaw "fastest repo to 100K stars in 2 days" — claimed but star history is gameable; treat as marketing-ish.
Real-world PR merge rates for OpenHands and Devin diverge widely between vendor case studies (12x ROI, 80+ PRs/wk) and user reports (15-30% success on first attempt).

Specific PKA Fit Assessment

Luci's stack already implements the OpenClaw pattern: Mission Control (gateway + ticket board), scheduler (cron), skills (~/.claude/skills/), file-first memory (vault.db, MEMORY.md), Telegram channel, multi-provider (Anthropic / GLM / Kimi / MiniMax). The work to migrate is small.

Capability	Luci has it?	Hermes adds?	Other coding agent adds?
Persistent memory across sessions	✅ MEMORY.md + vault.db	Marginal	n/a
Multi-channel (Telegram, Discord, …)	Partial (Telegram only)	✅ +5 channels	n/a
File-first skills	✅ 90+ skills	Equivalent	n/a
Self-improving learning loop	❌	✅ Reflective Phase every 15 calls	n/a
Cron / scheduler	✅ scheduler.py	Equivalent	n/a
Multi-provider model switching	✅ provider-switch skill	Equivalent	n/a
Subagent dispatch	✅ Agent tool, tessa, scott	Equivalent	n/a
Coding-depth on hard problems	❌ depends on Anthropic	❌	✅ Claude Code
Autonomous PR generation	Partial (Larry)	❌	✅ OpenHands, Devin
Recipe-style portable workflows	Partial (skills)	❌	✅ Goose Recipes

Biggest concrete gap: the learning loop. Worth backporting Hermes' Reflective Phase (compound skill is partway there) into Luci's existing skills system.

YouTube + Social Source Set (curated)

To be added to the NotebookLM notebook in Phase 2:

YouTube — channel : URL pattern : focus - Theo Browne (t3.gg) on Claude Code, Cursor, agent stack opinions - Fireship on agent tool roundups - Indy Dev Dan on Claude Code workflows - Matthew Berman on agent benchmarking - AI Jason on Cursor / Cline / OpenHands - Greg Isenberg on agent landscape commentary - Cole Medin on multi-agent setups - David (Hermes Agent walkthrough): https://www.youtube.com/watch?v=4Sln_6K2z8c - "Hermes agent just hit 57,000 GitHub stars" short: https://www.youtube.com/shorts/ns_X7wsm_HQ

Reddit threads / subs - r/ClaudeAI, r/ClaudeCode — Claude Code rankings (226 community mentions) - r/cursor — Cursor billing, parallel agents - r/LocalLLaMA — Hermes / OpenClaw / Goose threads - r/programming, r/ChatGPTCoding — broad rankings - r/hermesagent (~2.9K subs)

X / Twitter named devs - @swyx (latent.space) — agent landscape commentary - @simonw — daily-driver hands-on - @leerob (Vercel) — Cursor/Claude Code adoption - @anthropic / @AnthropicAI — Claude Code releases - @openai — Codex CLI - @teknium (Nous Research) — Hermes founder

Tech blogs / publications - The New Stack — "OpenClaw vs Hermes Agent: race to build AI assistants that never forget" - every.to — agent commentary - latent.space — agent commentary - simonw blog — daily hands-on - Anthropic Eng Blog — Claude Code release notes

Full URL Source List

OpenClaw

https://abvijaykumar.medium.com/openclaw-a-deep-agent-realization-14125bbd5bad
https://petronellatech.com/blog/openclaw-ai-agent-guide-2026
https://robotpaper.ai/reference-architecture-openclaw-early-feb-2026-edition-opus-4-6/
https://agenticera.fyi/2026/04/30/openclaw-the-open-source-agent-that-quietly-changed-everything/
https://skywork.ai/skypage/en/openclaw-ai-memory/2048600764352425984
https://github.com/rohitg00/awesome-openclaw
https://bibek-poudel.medium.com/how-openclaw-works-understanding-ai-agents-through-a-real-architecture-5d59cc7a4764

Hermes Agent

https://github.com/nousresearch/hermes-agent
https://hermes-agent.nousresearch.com/
https://hermes-agent.nousresearch.com/docs/user-stories
https://github.com/0xNyk/awesome-hermes-agent
https://github.com/NousResearch/hermes-agent-self-evolution
https://tokenmix.ai/blog/hermes-agent-review-self-improving-open-source-2026
https://hermes-agent.ai/blog/hermes-agent-review-2026
https://medium.com/@sathishkraju/i-switched-from-openclaw-to-hermes-agent-heres-what-nobody-told-me-5f33a746b6ca
https://www.youtube.com/watch?v=4Sln_6K2z8c
https://www.youtube.com/shorts/ns_X7wsm_HQ

Hermes vs OpenClaw vs Claude Code (head-to-head)

https://thenewstack.io/persistent-ai-agents-compared/
https://utilo.io/en/home/blog/hermes-vs-claude-code-vs-openclaw-2026
https://productivetechtalk.com/2026/04/09/ai-agents-2026-hermes-vs-openclo-vs-claude-code-exposed/
https://www.bluehost.com/blog/hermes-agent-vs-claude-code/
https://myclaw.ai/blog/hermes-agent-vs-claude-code
https://www.browseract.com/blog/hermes-agent-vs-claude-code-cursor
https://www.syntaxdispatch.com/blog/hermes-agent-vs-claude-code
https://hermesatlas.com/guide/vs-claude-code/

Claude Code / Coding-agent rankings

https://artificialanalysis.ai/agents/coding
https://www.morphllm.com/ai-coding-agent
https://frontman.sh/blog/best-open-source-ai-coding-tools-2026/
https://www.taskade.com/blog/claude-code-alternatives
https://www.faros.ai/blog/best-ai-coding-agents-2026
https://www.opensourcealternatives.to/blog/best-open-source-ai-coding-assistants
https://thoughts.jock.pl/p/ai-coding-harness-agents-2026
https://mightybot.ai/blog/coding-ai-agents-for-accelerating-engineering-workflows/
https://dev.to/soulentheo/every-ai-coding-cli-in-2026-the-complete-map-30-tools-compared-4gob
https://www.digitalocean.com/resources/articles/claude-code-alternatives

OpenHands

https://github.com/OpenHands/OpenHands
https://openhands.dev/
https://openhands.dev/blog/openhands-index
https://vibecoding.app/blog/openhands-review
https://localaimaster.com/blog/openhands-vs-swe-agent
https://arxiv.org/abs/2407.16741

Devin (Cognition)

https://devin.ai/pricing
https://www.openaitoolshub.org/en/blog/devin-ai-review
https://www.eesel.ai/blog/cognition-ai
https://aitoolranked.com/blog/devin-ai-review
https://www.digitalapplied.com/blog/devin-ai-autonomous-coding-complete-guide
https://aiagentsquare.com/agents/devin.html

Goose (Block / AAIF)

https://goose-docs.ai/
https://github.com/aaif-goose/goose
https://block.xyz/inside/block-open-source-introduces-codename-goose
https://effloow.com/articles/goose-open-source-ai-agent-review-2026
https://aitoolanalysis.com/goose-ai-review/
https://www.openaitoolshub.org/en/blog/goose-ai-agent-block-review
https://vibecodinghub.org/tools/goose
https://www.kdnuggets.com/free-agentic-coding-with-goose
https://www.aviator.co/podcast/block--ai-agents-goose

Reddit / community pulse

https://www.aitooldiscovery.com/guides/best-ai-for-coding-reddit
https://www.aitooldiscovery.com/guides/best-ai-agents-reddit
https://claw.mobile/blog/best-ai-coding-tool-reddit-2026
https://beginnersinai.org/best-ai-tools-reddit-2026/
https://www.nxcode.io/resources/news/best-ai-for-coding-2026-complete-ranking

Methodology Notes

WebSearch (anthropic) for Hermes / OpenClaw / Goose / OpenHands / Devin / Reddit pulse: 4 queries returned ~40 unique credible URLs.
Gemini CLI deep research (background, ~5 min) added comparative-matrix coverage.
Codex CLI deep research (background) hit a sandbox issue but contributed source coverage on awesome-openclaw + Hermes ecosystem links.
Cross-verification policy: any number cited above appears in ≥ 2 sources or is flagged [unverified].

Next Steps

Create NotebookLM notebook with this seed + the curated URL set.
Run NotebookLM Deep Research with a "deep web" query asking for:
Named-analyst commentary not yet in seed
Independent benchmark reproductions (especially Hermes 40% claim, Devin SWE-bench)
Migration case studies (OpenClaw → Hermes, Claude Code → Goose)
Security posture comparison (CVE histories)
Cost-modelling for a Luci-equivalent deployment on each
Gap-analyse, iterate once if needed.
Generate audio overview (richly framed for PKA decision), slide deck (visual matrix), and briefing report (CEO-style prescription).

AI Coding & Orchestration Agents 2026 — Seed Dossier