Mission Control — how it works, how it compares, what's missing

Live rescan of the running system on Luci (not the stale canonical PKA tree). Benchmarked against 7 public "agent OS / mission control" builds surveyed 2026-05-24.

interactive

ticket runtime (subscription, not -p)

MAX_WORKERS hard cap

Postgres

durable board + workflow state

auto-loop off

dev_review_qa built, not auto-spawning

The one-line verdict

MC already beats every surveyed build on the engine — durable Postgres board, interactive-tmux subscription runtimes, claim-after-spawn race guards + reaper, host-aware verification, a real harvest contract, a single-voice orchestrator, and quality gates (Tessa + council) that none of the others have. The remaining gaps are not capability — they are an unfinished auto-loop, a split console, and a small scheduled-job billing cleanup.

✅ What's solid

Orchestrator control-plane, interactive subscription runtimes, dispatch safety (race guard + reaper + locks), Postgres lifecycle, harvest contract, host-aware verify, Tessa + council gates, provider flexibility.

⚠️ Half-built

The dev_review_qa phase workflow is coded (phases + return_for_fixes requeue) but auto-spawn is disabled. No machine acceptance gate, no round cap. So the code→review→Tessa loop is manual and unreliable.

🔴 Open gaps

No single unified web console / live agent-health grid; ~9 scheduled claude -p jobs exposed to the 15-Jun API billing change; throughput capped at 2 workers.

How Mission Control works now

Click a stage to expand. This is the live runtime loop on Luci.

Intent

Elmar → orchestrator

Decompose

proposal cards → tickets

Pickup

scheduler spawns worker

Work

interactive tmux session

Gate

review / QA / question

Operator

sweep + self-heal

Digest

inbox → orchestrator

Ticket lifecycle (state machine)

todo→ pickup → in_progress → needs_input / in_review → done · cancelled

Worker emits line-start signals: QUESTION: → needs_input (runtime kept warm) · REVIEW: → in_review (warm) · DONE: → review-ready/close. loop primitive Return for fixes requeues implementation and makes the ticket pickupable again — the loop-back exists, but is triggered manually, not auto.

Component map (live `mission-control/`)

Control plane

persistent_luci.py — orchestrator (long-lived tmux). semantic_router.py — intent routing. orchestrator_triage.py — auto-triage (Gemini Flash). single-voice

Dispatch & runtime

mc_pickup.py — dispatcher (claim-after-spawn, reaper, MAX_WORKERS=2). ticket_runtime.py — interactive tmux claude. runtime_picker / runtime_pool — profile + warm pool. mc_tmux.py — send_input/capture.

Gates & health

luci_operator.py — board sweep / self-heal. council_runner.py — second opinion. followup_ledger / handoff — continuity. app.py — server, Workbench, harvest, Telegram bridge.

The roles (agents = roles, not processes)

◆

Luci

Orchestrator + default operator. Owns continuity, MC/Luci code, all dispatch.

⚙

Larry

Implementation on external repos (LegalMind / SafairBru / Coolify host).

✓

Tessa

QA / UX validation. Mandatory for UI & 3+ component changes.

🔍

Scott

Research / current-source scout.

🏛

Atlas

Architecture sign-off for PKA/MC system changes.

⚖

Council

Codex+Gemini+Opus+Kimi+GLM. Significant runtime/security/workflow changes.

How MC compares to the other models

7 public builds surveyed. The split is consistent: marketers build pretty consoles over thin engines; engineers build raw orchestration with no governance. MC is the inverse of the marketers and ahead of the engineers on safety.

Dimension	Mission Control	Hermes / Julian	Alex Finn	IndyDevDan	Auto Claude
Orchestration trigger	Scheduler push + operator sweep	Heartbeat persona poll	OpenClaw heartbeat	Agent spawns sub-agents	Desktop spawns instances
Per-task isolation	● own tmux session + session_id	○ personas, 1 process	○ personas	● sandbox + ctx window	● worktrees
Ticket board	● Postgres, lifecycle-gated	◐ kanban (DB/md)	◐ kanban	◐ event stream	● kanban + GH issues
Dispatch safety	● race guard + reaper + locks	○	○	○	◐
Quality gates	● Tessa + council + Atlas	○ none	○ none	○ none	◐ review agent
Memory	● vault.db + SB + graph	◐ Obsidian bolt-on	◐ journal	◐ per-ctx	◐ Graphiti
Host-aware verify	● resolve + verify on host	○	○	○	○
Auto code→review loop	◐ built, auto-spawn OFF	○	○	◐ SendMessage	◐
Unified web console	◐ Board+Workbench, split	● polished	● polished	◐ observability	● desktop app
Live agent-health grid	◐ data, no at-a-glance grid	●	●	● event pulse	◐
Billing model	● subscription (tmux)	API/credits	API/credits	API	API

● strong ◐ partial ○ absent

Engine maturity vs console polish

The whole market sits on one diagonal. MC is top-left — strongest engine, weakest console. The marketers are bottom-right. The opportunity is to move MC right without losing the engine.

▲ engine maturity

console polish ▶

Mission Control

Alex Finn

Hermes / Julian

IndyDevDan

Auto Claude

CC Agent View

↗ target

Gaps & what I propose

Ordered by leverage. Each gap is grounded in the live code.

1 · The code→review→Tessa loop doesn't auto-run high

evidence: app.py:4473 dev_review_qa phases defined · app.py:6150 "auto-spawn currently disabled by should_auto_workflow" · return_for_fixes exists but manual

The phase workflow + the requeue primitive are already coded. What's missing is automatic progression and a stop-condition — so today you ask one agent to be the loop controller, which never holds.

Propose: Enable bounded dev_review_qa auto-spawn behind three guards: (a) a machine-checkable acceptance gate per phase (tests pass / build green / Tessa PASS — not "until Opus is happy"); (b) a max-round cap (e.g. 3) → on exceed, park needs_input + ping; (c) the existing circuit-breaker (≥2 same-root-cause failures → freeze + audit). Coder = codex profile, reviewer = Opus, QA = Tessa browser — each a bounded role runtime, continuity via branch/artifacts not chat.

2 · No single unified console / live agent-health grid medium

evidence: surfaces split across Board · Runtime Workbench · dashboard chat · Telegram · PKA :8787 (functional, not designed)

Every surveyed build's one real advantage is a dark 3-zone console (sidebar · main · live-activity rail) with an at-a-glance agent grid. MC has all the data, scattered.

Propose: A mission-control console view consolidating onto one surface: left nav (Board / Workbench / Agents / Activity), centre = the work, right = a live activity rail, plus an Agents health grid (Luci / Larry / Tessa / Scott + provider status) with the green/yellow/red semaphore + current task + latency. Reuse this dark "mission control" language. Skip the pixel "Office" — every reviewer flagged it as gimmick. Tier-2, no new server.

3 · Scheduled `claude -p` jobs exposed to 15-Jun API billing medium · time-boxed

evidence: ~9 task defs shell claude -p — life-manager-digest/scan, morning-briefing, b4i-fuel-history, claude-mem-value-eval, agent-watch, legalmind-wa-watcher, provider-smoke, self-improve-luci-weekly. (Ticket work already subscription/interactive — not exposed.)

From 15 June, headless -p bills to the separate API account. Tickets are safe; these scheduled jobs are the only exposure.

Propose: Audit the 9 → quality-critical ones (briefings, self-improve) move to a subscription-interactive runner; the rest (scans, smoke, digests) drop to GLM / Gemini Flash via the existing provider-switch. Neutralises the change for cents. Do before 15 Jun.

4 · Throughput capped at 2 workers; single-lane stall risk low

evidence: mc_pickup.py MAX_WORKERS = 2 (lowered 3→2) · runtime_pool.py exists

Fine for safety today, but a ceiling once the auto-loop runs (each loop holds a slot through multiple phases).

Propose: Tie worker count to the runtime model — a small warm pool of per-role slots (e.g. 2 coder + 1 review + 1 Tessa) rather than one global cap, with the operator sweep watching for a wedged slot. Revisit cap after the auto-loop lands.

5 · Heavy review/QA runtimes reaped mid-loop low · partly fixed

evidence: prior incidents (truncated subagent / heavy-runtime harvest) · harvest durability work (MC-3482, MC-3804 MCP-primary harvest) already in tree

Long Tessa/Preview or review runs can truncate before writing a verdict → loop looks stuck.

Propose: Keep the MCP-primary harvest + enforce incremental commit/push per phase so a reaped runtime is reconstructable from git + harvest, never the cut-off return. Already the direction — make it a hard phase rule.

Suggested sequence

now

Gap 3 (billing audit) — small, deadline-driven, before 15 Jun.

Gap 1 (auto-loop with acceptance gate + cap) — the real unlock. Atlas + council first.

then

Gap 2 (console + agent grid), then 4/5 tune once the loop runs.

Scanned live on Luci ~/workspace/mission-control/ 2026-05-24. Comparison from the 6-builder survey (Elmar Inbox/agent-os-survey.md) + Hermes teardown. Severity reflects leverage × effort, not difficulty alone.