Mission Control — how it works, how it compares, what's missing
Live rescan of the running system on Luci (not the stale canonical PKA tree). Benchmarked against 7 public "agent OS / mission control" builds surveyed 2026-05-24.
-p)The one-line verdict
MC already beats every surveyed build on the engine — durable Postgres board, interactive-tmux subscription runtimes, claim-after-spawn race guards + reaper, host-aware verification, a real harvest contract, a single-voice orchestrator, and quality gates (Tessa + council) that none of the others have. The remaining gaps are not capability — they are an unfinished auto-loop, a split console, and a small scheduled-job billing cleanup.
✅ What's solid
Orchestrator control-plane, interactive subscription runtimes, dispatch safety (race guard + reaper + locks), Postgres lifecycle, harvest contract, host-aware verify, Tessa + council gates, provider flexibility.
⚠️ Half-built
The dev_review_qa phase workflow is coded (phases + return_for_fixes requeue) but auto-spawn is disabled. No machine acceptance gate, no round cap. So the code→review→Tessa loop is manual and unreliable.
🔴 Open gaps
No single unified web console / live agent-health grid; ~9 scheduled claude -p jobs exposed to the 15-Jun API billing change; throughput capped at 2 workers.
How Mission Control works now
Click a stage to expand. This is the live runtime loop on Luci.
Ticket lifecycle (state machine)
Worker emits line-start signals: QUESTION: → needs_input (runtime kept warm) · REVIEW: → in_review (warm) · DONE: → review-ready/close. loop primitive Return for fixes requeues implementation and makes the ticket pickupable again — the loop-back exists, but is triggered manually, not auto.
Component map (live mission-control/)
Control plane
persistent_luci.py — orchestrator (long-lived tmux). semantic_router.py — intent routing. orchestrator_triage.py — auto-triage (Gemini Flash). single-voice
Dispatch & runtime
mc_pickup.py — dispatcher (claim-after-spawn, reaper, MAX_WORKERS=2). ticket_runtime.py — interactive tmux claude. runtime_picker / runtime_pool — profile + warm pool. mc_tmux.py — send_input/capture.
Gates & health
luci_operator.py — board sweep / self-heal. council_runner.py — second opinion. followup_ledger / handoff — continuity. app.py — server, Workbench, harvest, Telegram bridge.
The roles (agents = roles, not processes)
How MC compares to the other models
7 public builds surveyed. The split is consistent: marketers build pretty consoles over thin engines; engineers build raw orchestration with no governance. MC is the inverse of the marketers and ahead of the engineers on safety.
| Dimension | Mission Control | Hermes / Julian | Alex Finn | IndyDevDan | Auto Claude |
|---|---|---|---|---|---|
| Orchestration trigger | Scheduler push + operator sweep | Heartbeat persona poll | OpenClaw heartbeat | Agent spawns sub-agents | Desktop spawns instances |
| Per-task isolation | ● own tmux session + session_id | ○ personas, 1 process | ○ personas | ● sandbox + ctx window | ● worktrees |
| Ticket board | ● Postgres, lifecycle-gated | ◐ kanban (DB/md) | ◐ kanban | ◐ event stream | ● kanban + GH issues |
| Dispatch safety | ● race guard + reaper + locks | ○ | ○ | ○ | ◐ |
| Quality gates | ● Tessa + council + Atlas | ○ none | ○ none | ○ none | ◐ review agent |
| Memory | ● vault.db + SB + graph | ◐ Obsidian bolt-on | ◐ journal | ◐ per-ctx | ◐ Graphiti |
| Host-aware verify | ● resolve + verify on host | ○ | ○ | ○ | ○ |
| Auto code→review loop | ◐ built, auto-spawn OFF | ○ | ○ | ◐ SendMessage | ◐ |
| Unified web console | ◐ Board+Workbench, split | ● polished | ● polished | ◐ observability | ● desktop app |
| Live agent-health grid | ◐ data, no at-a-glance grid | ● | ● | ● event pulse | ◐ |
| Billing model | ● subscription (tmux) | API/credits | API/credits | API | API |
● strong ◐ partial ○ absent
Engine maturity vs console polish
The whole market sits on one diagonal. MC is top-left — strongest engine, weakest console. The marketers are bottom-right. The opportunity is to move MC right without losing the engine.
Gaps & what I propose
Ordered by leverage. Each gap is grounded in the live code.
1 · The code→review→Tessa loop doesn't auto-run high
dev_review_qa phases defined · app.py:6150 "auto-spawn currently disabled by should_auto_workflow" · return_for_fixes exists but manualThe phase workflow + the requeue primitive are already coded. What's missing is automatic progression and a stop-condition — so today you ask one agent to be the loop controller, which never holds.
dev_review_qa auto-spawn behind three guards: (a) a machine-checkable acceptance gate per phase (tests pass / build green / Tessa PASS — not "until Opus is happy"); (b) a max-round cap (e.g. 3) → on exceed, park needs_input + ping; (c) the existing circuit-breaker (≥2 same-root-cause failures → freeze + audit). Coder = codex profile, reviewer = Opus, QA = Tessa browser — each a bounded role runtime, continuity via branch/artifacts not chat.
2 · No single unified console / live agent-health grid medium
Every surveyed build's one real advantage is a dark 3-zone console (sidebar · main · live-activity rail) with an at-a-glance agent grid. MC has all the data, scattered.
3 · Scheduled claude -p jobs exposed to 15-Jun API billing medium · time-boxed
claude -p — life-manager-digest/scan, morning-briefing, b4i-fuel-history, claude-mem-value-eval, agent-watch, legalmind-wa-watcher, provider-smoke, self-improve-luci-weekly. (Ticket work already subscription/interactive — not exposed.)From 15 June, headless -p bills to the separate API account. Tickets are safe; these scheduled jobs are the only exposure.
4 · Throughput capped at 2 workers; single-lane stall risk low
MAX_WORKERS = 2 (lowered 3→2) · runtime_pool.py existsFine for safety today, but a ceiling once the auto-loop runs (each loop holds a slot through multiple phases).
5 · Heavy review/QA runtimes reaped mid-loop low · partly fixed
Long Tessa/Preview or review runs can truncate before writing a verdict → loop looks stuck.
Suggested sequence
Gap 3 (billing audit) — small, deadline-driven, before 15 Jun.
Gap 1 (auto-loop with acceptance gate + cap) — the real unlock. Atlas + council first.
Gap 2 (console + agent grid), then 4/5 tune once the loop runs.
Scanned live on Luci ~/workspace/mission-control/ 2026-05-24. Comparison from the 6-builder survey (Elmar Inbox/agent-os-survey.md) + Hermes teardown. Severity reflects leverage × effort, not difficulty alone.