Orchestration & Delegation Framework for Elmar

Date: 2026-05-31
Author: Lucienne (architectural review)
Status: Practical guide — how to run a 7-person virtual team with Hermes + Claude Code

1. The Simple Mental Model: "A Restaurant Kitchen"

Imagine your team as a restaurant kitchen:

Role	Kitchen Analogy	What They Actually Do
Planner	Head chef / Expediter	Reads the order, breaks it into steps, decides who cooks what
Designer	Pastry chef / Plating lead	Makes it look beautiful and feel right before it goes out
Builder/Coder	Line cook	Cooks the dish — writes the code, edits the files
Reviewer	Sous chef tasting	Tastes before it leaves — code review, logic check
QA Coder	Food safety inspector	Runs tests, checks for bugs, verifies it works
QA Design	Visual inspector	Checks the plate looks right on mobile and desktop
Browser Tester	Mystery shopper	Actually sits down and tries to eat the meal — real user flow

The key insight: Not every order needs all seven people. A cup of coffee (fix a typo) just needs the Builder. A tasting menu (new feature) needs the full kitchen.

2. How Work Flows: The "Ticket → Runtime → Evidence" Loop

Elmar has an idea
    ↓
Planner (Luci) creates a ticket in Mission Control
    ↓
Planner decides: "Is this simple or complex?"
    ↓
    ├─ Simple → Builder does it directly in one runtime session
    └─ Complex → Designer briefs → Builder codes → Reviewer checks → QA tests → Browser Tester validates
    ↓
Evidence (screenshots, tests, diffs) harvests back into the ticket
    ↓
Planner shows Elmar: "Done. Here's the proof."

The golden rule: Every piece of work becomes a ticket in Mission Control. Every ticket gets a runtime (a live working session). Every runtime produces evidence that gets recorded back into the ticket.

3. Mapping Your 7-Person Team to Actual Tools

The Team → Tool Mapping

Team Role	Primary Tool	When to Use	Concrete Example
Planner	Luci (Hermes/Claude Code in tmux)	Every ticket starts here	"Create ticket MC-4500: fix login button on mobile"
Designer	Luci with design rubric + browser vision	UI/UX tickets before coding	"Design brief: 375px mobile, chat-first layout, sticky composer"
Builder/Coder	Claude Code CLI (`claude` in tmux)	All code work	`claude --mcp-config mc-coord-mcp-config.json` in ticket runtime
Reviewer	Claude Code or Codex CLI (`codex`)	Code review phase	Spawn Codex in tmux, feed it the diff + brief, ask for verdict
QA Coder	Claude Code + pytest/regression scripts	Test verification	Run `scripts/mc_regression.sh`, attach output to ticket
QA Design	Browser tool + vision model (Tessa)	Visual/mobile validation	Screenshot at 375px, vision model checks spacing, tap targets
Browser Tester	Browser automation + Claude Code	End-to-end user flows	Navigate real routes, fill forms, verify console errors

Where the "People" Actually Live

Your 7-person team is not 7 separate computers. They are roles that run in runtime sessions:

Planner = Luci's persistent tmux session (mc-chat-luci or ticket runtime)
Builder = Claude Code CLI running in a ticket-specific tmux session
Designer = Luci with browser/vision tools, or a subagent with design skills
Reviewer = A second Claude Code or Codex runtime spawned for review
QA = Claude Code with test tools, or deterministic scripts
Browser Tester = Claude Code with browser MCP tools

One person (Claude Code) can wear multiple hats — but only one hat at a time per runtime. The Planner decides when to switch hats or spawn a new worker.

4. Decision Tree: Which Tool for Which Job?

START: You have work to do.
    │
    ▼
┌─────────────────────────────────────┐
│ Is this a recurring health check    │
│ or deterministic script?            │
└─────────────────────────────────────┘
    │
    ├─ YES → Use Hermes cronjob with `no_agent: true`
    │         (cheap, runs every 5-15 min, only calls LLM on events)
    │
    └─ NO → Continue
              │
              ▼
┌─────────────────────────────────────┐
│ Does this need file edits, shell,   │
│ or MCP tools?                       │
└─────────────────────────────────────┘
    │
    ├─ YES → Use a CLI runtime (Claude Code, Hermes, Codex)
    │         in a tmux session
    │
    └─ NO → Continue
              │
              ▼
┌─────────────────────────────────────┐
│ Is this a quick analysis, summary,  │
│ or triage with no tools needed?     │
└─────────────────────────────────────┘
    │
    ├─ YES → Use direct API call (cheap, fast, no tmux needed)
    │         Example: "Summarize these 10 tickets"
    │
    └─ NO → Continue
              │
              ▼
┌─────────────────────────────────────┐
│ Is this a bounded, focused task     │
│ that needs isolation from the main  │
│ runtime?                            │
└─────────────────────────────────────┘
    │
    ├─ YES → Use Hermes `delegate_task` (subagent)
    │         Spawns a fresh session, clean context, returns result
    │         Good for: research, council review, one-off analysis
    │
    └─ NO → Use the main ticket runtime (Claude Code in tmux)
              This is the default for most work.

Quick Reference Table

Situation	Tool	Why
Health check every 5 min	Hermes cronjob `no_agent`	Costs ~$0.01/day, only wakes LLM on events
Write code, edit files	Claude Code CLI in tmux	Full tool access, file edits, MCP, skills
Quick summary of tickets	Direct API (Kimi/GLM)	Cheap, fast, no setup
Code review / second opinion	Codex CLI or subagent	Fresh eyes, isolated context
Visual QA / mobile check	Browser + vision model	Real screenshots, real DOM
Research / scout work	Subagent with web search	Bounded, doesn't clutter main runtime
Long-running task ( > 30 min)	`terminal(background=true)`	Runs without blocking, you check later
Urgent fix, needs hands-on	Claude Code directly in tmux	Fastest path from brain to file

5. Maximizing Your Claude Code Subscription

Claude Code is your premium tool. Here's how to get the most from it:

What Claude Code Is Best At

Code generation and editing — it's the best coder on the team
Complex reasoning with tools — MCP, browser, file system
Interactive debugging — live terminal, run tests, inspect errors
Code review — reading diffs, spotting bugs, suggesting fixes

What Claude Code Is NOT For

Cheap polling — don't burn Claude tokens on "check if anything changed" loops
Simple summaries — use Kimi/GLM API for that (10x cheaper)
Deterministic scripts — use Python/bash cronjobs, no LLM needed

The 80/20 Rule for Claude Code

Use Claude Code For	Don't Use Claude Code For
Writing and editing code	Health checks and polling
Interactive debugging	Simple text summaries
Complex architectural decisions	Deterministic data transforms
Code review and QA	Scheduled status reports
Browser-based user flow testing	Bulk ticket triage

Concrete Pattern: "Claude Does the Hard Stuff, Others Do the Rest"

Ticket arrives
    ↓
Planner (Luci, cheap Hermes/Kimi) reads and routes it
    ↓
Builder (Claude Code) writes the code
    ↓
Reviewer (Codex or Claude Code second pass) checks the diff
    ↓
QA Coder (Claude Code + pytest) runs tests
    ↓
QA Design (browser + vision model) checks mobile screenshot
    ↓
Planner (Luci) synthesizes and reports to Elmar

Cost math: If Claude Code costs $0.50 per code task and Kimi costs $0.02 per routing task, you save 90% by keeping Claude for code and using cheaper models for routing.

6. Keeping Visual and Design Consistency

The Design Brief Pattern

Before any Builder starts coding, the Planner must attach a design brief to the ticket. This is a checklist:

DESIGN BRIEF for MC-4500
- Target: Mobile login screen (375px viewport)
- User flow: Tap "Login" → enter credentials → tap "Submit"
- Visual standard: Follow Material 3 bottom-sheet pattern
- Colors: Use existing CSS variables (--mc-primary, --mc-surface)
- Touch targets: Minimum 48px for all buttons
- Composer: Sticky bottom, never overlap content
- Evidence required: 375px screenshot + desktop screenshot + console log

The rule: No UI ticket moves to "In Progress" without a design brief. The brief is created by the Planner (with Designer input if needed) and enforced by the Reviewer.

The Visual QA Gate

For every UI ticket, the Browser Tester must provide:

375px mobile screenshot — actual browser render, not DOM theory
Desktop screenshot — verify it still looks good
Console log — no JavaScript errors
Tap/scroll verification — actual interaction, not just visual

The gate: A ticket cannot move to "Done" without these four items. Mission Control enforces this with the mobile_review_required flag.

Consistency Through Reuse, Not Memory

Don't rely on the model to "remember" the design system. Instead:

Reference files: Point to docs/design-system.md or styles/variables.css
Design rubrics: Keep a checklist in ~/workspace/design-rubric.md
Component library: Reuse existing components, don't rebuild
Screenshot diffs: Compare before/after for visual changes

7. The Control Room: How It All Comes Together

The Simplest Version

Elmar says: "Fix the mobile login button"
    ↓
Luci (Planner) creates ticket MC-4500
    ↓
Luci writes design brief (Designer hat)
    ↓
Luci spawns Claude Code runtime (Builder hat)
    ↓
Claude Code writes code, runs tests
    ↓
Claude Code signals DONE → ticket moves to Review
    ↓
Luci spawns Codex runtime (Reviewer hat) → approves
    ↓
Tessa (Browser + vision) checks 375px screenshot → approves
    ↓
Luci moves ticket to Done, reports to Elmar

What Actually Happens in the System

Ticket created in Mission Control (SQLite/PostgreSQL)
Runtime session created in runtime_sessions table
tmux session spawned: mc-ticket-4500
Claude Code launched inside tmux with MCP config
Work happens — file edits, tests, browser checks
Signal file written: state/mc-signals/MC-4500.json
Harvest reads signal, updates ticket status, records history
Runtime closed or kept warm for next phase

8. Summary: The "One-Pager" for Elmar

Your Virtual Team

Planner (Luci): Routes everything. Uses cheap models (Kimi/Hermes). Owns the board.
Builder (Claude Code): Writes all code. Runs in tmux. Your most expensive and capable worker.
Designer (Luci + brief): Creates design briefs before UI work.
Reviewer (Codex/Claude): Second pair of eyes on code.
QA Coder (Claude + tests): Runs regression scripts, verifies logic.
QA Design (Browser + vision): Checks screenshots, mobile layout.
Browser Tester (Claude + browser tools): Real user-flow validation.

Your Tools

Mission Control: The board. All tickets, all evidence, all history.
Claude Code: Premium coder. Use for code, debug, review, browser testing.
Codex: Second opinion. Good for review and focused tasks.
Hermes/Kimi/GLM: Cheap routing, summaries, triage. Use for Planner work.
Cronjobs: Health checks, polling. Use no_agent scripts.
Subagents (delegate_task): Isolated tasks. Research, council, one-offs.

Your Decision Tree (Memorize This)

Recurring/polling? → Cronjob no_agent
Need file edits or tools? → Claude Code in tmux
Quick summary, no tools? → Direct API (Kimi/GLM)
Isolated focused task? → Subagent (delegate_task)
Everything else? → Main ticket runtime

Your Cost Control

Claude Code = premium, use sparingly for code
Kimi/GLM API = cheap, use for routing and summaries
no_agent cron = almost free, use for polling
Subagents = pay per task, good for bounded work

Your Quality Control

Design brief before UI work
375px screenshot gate before Done
Reviewer on all code changes
Regression tests on all fixes

9. Appendix: Concrete Commands

Spawn a Builder (Claude Code) for a ticket

# In Luci's orchestrator session
tmux new-session -d -s mc-ticket-4500 -c ~/workspace/worktrees/mc-4500
claude --mcp-config ~/workspace/mission-control/mc-coord-mcp-config.json

Spawn a Reviewer (Codex) for code review

tmux new-session -d -s mc-review-4500 -c ~/workspace/worktrees/mc-4500
codex --approval-mode full-auto
# Feed it: git diff + design brief + review checklist

Run a cheap summary (Kimi API)

# Direct API call, no tmux, no tools
import openai
client = openai.OpenAI(base_url="https://api.moonshot.cn/v1", api_key=...)
response = client.chat.completions.create(model="kimi-latest", messages=[...])

Delegate a research task (Hermes subagent)

# In Hermes, use delegate_task
# Spawns isolated session, runs task, returns result to parent

Health check cron (no_agent)

{
  "schedule": "*/5 * * * *",
  "no_agent": true,
  "command": "python3 ~/workspace/mission-control/scripts/health_check.py"
}

Bottom line for Elmar:

Mission Control is your kitchen whiteboard. Claude Code is your head chef. The other models are your prep cooks and line cooks. You (the Planner) decide what gets cooked and who cooks it. The system records everything so you never lose track. Keep Claude on the hard stuff, use cheap models for the easy stuff, and never skip the design brief or the screenshot gate.