Mission Control Recovery PRD + Status Ledger

Status: Draft controller baseline
Owner/controller: Lucienne
System of record: Mission Control tickets + runtime_sessions + canonical docs under /home/lucienne/workspace/mission-control/docs/
Status timestamp: 2026-05-24 17:08 SAST
Related umbrella: MC-4000 — Mission Control awesome recovery
Runtime-independence foundation: MC-3898
Source plan: /home/lucienne/workspace/reports/luci-control-room-runtime-independence-plan.md
Source specs:
/home/lucienne/workspace/mission-control/docs/mission-control-unified-product-spec.md
/home/lucienne/workspace/mission-control/docs/runtime-architecture-refresh.md
/home/lucienne/workspace/mission-control/docs/mission-control-redesign-delivery-process.md

Executive summary

Mission Control has made material progress through the Telegram-driven controller loop. The runtime-independence architecture is now substantially documented and partially enforced in code, the control-room/governance layer exists, Workbench and Tasks/Scheduler have completed meaningful audit/fix cycles, and follow-up/check-in ledger work has landed.

The initiative is not yet complete. The full test suite currently fails during collection, runtime-switch continuity and a low-risk Hermes-through-MC path remain unproven, MC-4049 campaign-owner separation is still in progress, and the page-by-page product recovery loop has not covered all core surfaces with the full evidence standard.

Product north star

Mission Control is Luci's chat-first AI operating cockpit with durable ticket/workflow state underneath. It must feel like one coherent Luci operator while work can be executed by replaceable runtimes: Claude Code, Hermes, Codex, Kimi, Gemini, direct API scripts, browser harnesses, Larry SSH, or future adapters.

Mission Control owns:

tickets and workflow state;
routing intent and campaign ownership;
runtime ledger rows in runtime_sessions;
evidence, comments, history, and status transitions;
scheduled-task visibility;
user-facing operating model.

Runtime CLIs and provider/model profiles are replaceable adapters, not the source of truth.

Status legend

Status	Meaning
DONE	Implemented, documented, tested, reviewed, and signed off for the stated scope.
SHIPPED PARTIAL	Landed and useful, but one or more acceptance gates remain open.
IN PROGRESS	Active implementation/review work exists.
BLOCKED	Cannot complete without resolving a known gate/dependency.
NOT STARTED	Planned but no durable evidence found.

Gate columns:

Gate	Meaning
Docs	Canonical docs/spec/runbook updated.
Code	Implementation landed on current master/live repo.
Tests	Focused and/or suite tests pass for the slice.
Review	Independent non-writer review completed.
Browser/Tessa	Browser user journey and visual acceptance completed where relevant.
Signoff	Lucienne/Atlas/product signoff recorded.

Current status by workstream

Workstream	Status	Docs	Code	Tests	Review	Browser/Tessa	Signoff	Evidence / notes
Runtime-independence foundation (MC-3898)	SHIPPED PARTIAL	yes	yes	focused	partial	n/a	partial	Adapter metadata landed; MC-3898 is done. Runtime-switch continuity and live Hermes path remain open acceptance checks.
Control-room governance layer	DONE	yes	n/a	read-only	yes	n/a	yes	`/home/lucienne/workspace/agent-control-room/` exists with runtime docs, registry, runbooks, webui governance, Hermes reference notes.
Runtime adapter metadata	SHIPPED PARTIAL	yes	yes	focused	partial	n/a	partial	`persistent_luci.py` records harness-independent metadata: adapter contract version, harness, resolved executable, redacted argv/command, profile/provider/model/cwd/provenance.
Runtime profile portability	SHIPPED PARTIAL	yes	yes	focused	partial	n/a	partial	Profiles exist for Claude/Hermes/Codex/etc. Direct API/tool-capability boundaries documented. Low-risk switch/Hermes session validation still needed.
Unified MC product spec	DONE as baseline	yes	n/a	n/a	yes	n/a	yes	`mission-control-unified-product-spec.md` consolidates north star, entities, page model, visual principles, and delivery governance.
Delivery/governance process	DONE as baseline	yes	n/a	n/a	yes	n/a	yes	`mission-control-redesign-delivery-process.md` defines discovery, coder lanes, review, Tessa, Atlas/signoff, and Lucienne evidence inspection.
Workbench/Console audit + Lane A (MC-4030/MC-4031)	SHIPPED PARTIAL	yes	yes	focused	yes	claimed	partial	Audit found 14 issues and Lane A fixes were recorded as shipped. Needs fresh regression/browser pass after suite health is restored.
Tasks/Scheduler audit + Lane A/B (MC-4045/MC-4056)	DONE for current slice	yes	yes	focused	yes	yes	yes	Ticket comments record live restart, desktop/mobile Tessa QA, corrected counts/reap evidence, Lane B hardening.
Expected check-in / follow-up ledger (MC-4060)	DONE for current slice	yes	yes	focused	partial	n/a	yes	`followup_ledger.py`, migration 016, sweep task and helper landed; ticket closed.
Campaign/orchestrator owner separate from executor (MC-4049)	IN PROGRESS	partial	branch/scratch	pending	pending	pending	pending	Important control-plane feature. Needs fresh integration, tests, review, browser/UX check where visible, and explicit signoff before live restart/deploy.
Full-suite verification	BLOCKED	n/a	n/a	no	n/a	n/a	no	`pytest` currently fails during collection on `shared_console.blueprint`; focused subset also has workflow idempotency and terminal reconciliation failures.
Runtime-switch continuity proof	NOT STARTED / OPEN	runbook yes	not proven	no	no	n/a	no	Original plan acceptance remains unchecked.
Low-risk Hermes-through-MC proof	NOT STARTED / OPEN	runbook yes	not proven	no	no	n/a	no	Needed before changing production runtime routing or declaring Hermes adapter path mature.
Board page recovery	OPEN	spec baseline	unknown	unknown	no	no	no	Should be next core product surface after test/campaign-owner gates.
Home/chat shell recovery	OPEN	spec baseline	partial existing	unknown	no	no	no	Chat-first shell is core UX; needs page contract, audit, implementation loop, browser/mobile evidence.
Global visual/design acceptance	OPEN	baseline yes	partial	n/a	no	partial	no	Visual quality is a first-class requirement. Not yet complete across core surfaces.

Requirements status

Functional requirements

ID	Requirement	Priority	Status	Notes
FR-1	MC remains source of truth for tickets, runtime state, workflow history, and evidence.	P0	SHIPPED PARTIAL	Strong docs and runtime ledger work exist; full behaviour still needs suite/browser validation.
FR-2	Runtime CLIs/providers/models are replaceable adapters under MC control.	P0	SHIPPED PARTIAL	Adapter metadata exists; continuity proof still open.
FR-3	Control room is governance/navigation only, not a second database.	P0	DONE	Implemented as docs/runbooks with links to authoritative state.
FR-4	Runtime sessions record adapter contract metadata with redacted invocation details.	P0	SHIPPED PARTIAL	Implemented for CLI launches; keep expanding tests around preservation/reconciliation.
FR-5	Direct API profiles are blocked/hidden where tool-capable runtimes are required.	P0	SHIPPED PARTIAL	Documented and appears represented in profile policy; keep regression coverage.
FR-6	Workbench is the execution cockpit for active ticket work.	P0	SHIPPED PARTIAL	Audit/fix loop happened; fresh end-to-end browser validation needed after test repairs.
FR-7	Tasks/Scheduler outcomes are visible and not invisible cron.	P0	DONE for current slice	Tasks page hardening and follow-up ledger improve this materially.
FR-8	Campaign controller/orchestrator ownership is distinct from executor assignment.	P0	IN PROGRESS	MC-4049 is the active gap.
FR-9	Page delivery follows audit -> implementation -> review -> Tessa -> signoff.	P0	PARTIAL	Proven on some slices, not yet universal for MC-4000.
FR-10	Runtime switching preserves readable ticket history and old runtime evidence.	P0	OPEN	Explicit live proof still needed.
FR-11	Low-risk Hermes profile/session can run through MC with visible status/failure handling.	P1	OPEN	Needed before Hermes adapter maturity claims.
FR-12	Board/Home/Workbench/Tasks all meet the unified product spec and visual bar.	P0	PARTIAL	Tasks strongest; Workbench partial; Board/Home still open.

Non-functional requirements

ID	Requirement	Priority	Status	Notes
NFR-1	Test harness health: full suite collects and focused regression packs pass.	P0	BLOCKED	Full suite collection error must be fixed before more broad work.
NFR-2	No secrets or raw commands persisted in durable metadata.	P0	SHIPPED PARTIAL	Redacted argv/command implemented; keep audit coverage.
NFR-3	Runtime/profile terminology remains consistent.	P0	DONE baseline	Docs normalize runtime/provider/model/profile.
NFR-4	UI changes are visually coherent on desktop and mobile.	P0	PARTIAL	Tasks has evidence; global visual acceptance remains open.
NFR-5	Live deploy/restart gates are explicit and not implied by code merge.	P0	PARTIAL	Telegram process respected gates in recent slices; codify this into PRD acceptance.
NFR-6	Controller loop does not ask Elmar technical implementation questions.	P1	PARTIAL	Recent process mostly followed this; keep as operating rule.

Open blockers and risks

Full pytest is not usable as a global gate yet.
Current collection blocker: tests/test_console_panel.py imports shared_console.blueprint, but shared_console is not present as a package in the inspected checkout.
Focused runtime/workflow subset had 4 failures in workflow idempotency and terminal-state reconciliation.
Runtime independence is architecturally sound but not fully proven operationally.
Runtime-switch continuity needs a low-risk test ticket/session.
Hermes-through-MC needs a low-risk live path proof.
MC-4049 is an important control-plane dependency.
Campaign owner vs executor needs to land cleanly before the broader controller model can be considered first-class.
Evidence is distributed.
Current truth is split across docs, ticket comments, commits, Telegram/controller notes, and test logs.
MC-4000 needs an evidence index to avoid future agents re-discovering the same status.
Visual acceptance is not complete across the product.
Tasks has strong evidence.
Workbench needs a fresh pass after tests stabilize.
Board and Home remain open core surfaces.

Design/spec updates needed

The conceptual design does not need a pivot. The core architecture is still right: MC is the control plane; runtimes are replaceable adapters.

The plan/design should be updated in four ways:

Add a PRD/status ledger as a canonical companion to the original plan.
The original plan is good for history and architecture intent.
The PRD/status ledger should answer: what is complete, what was tested, what was reviewed, what was browser-verified, what is signed off, and what remains open.
Add an explicit verification ladder.
Documented only.
Code landed.
Focused tests pass.
Full suite collects/passes or known failures are explicitly waived.
Independent review complete.
Browser/Tessa complete where UI-facing.
Atlas/system/product signoff complete.
Live deployed/restarted if relevant.
Promote campaign ownership into the product model.
campaign_owner / controller ownership should be a first-class MC concept, separate from assigned_to executor.
MC-4049 should update docs/specs after it lands.
Add a global safety gate: no broad MC completion claim while full test collection is broken.
Local/live urgent fixes can still proceed with explicit approval and focused evidence.
But MC-4000 cannot close while the suite cannot collect.

Immediate recommended next sequence

Stabilize verification first.
Fix the full-suite collection blocker.
Fix or explicitly triage the focused workflow/runtime failures.
Produce a current test baseline.
Finish MC-4049.
Integrate on current master.
Run focused tests.
Get independent review.
Browser/UX-check any visible owner/assignee changes.
Record signoff before live restart/deploy.
Prove runtime-independence operationally.
Low-risk runtime-switch continuity test.
Low-risk Hermes profile/session through MC.
Record runtime_sessions metadata and ticket-history evidence.
Continue page-by-page recovery.
Board next.
Home/chat shell after Board.
Revisit Workbench after test health improves.
Build MC-4000 evidence index.
Link every completed slice to: audit doc, ticket, commits, tests, review, browser/Tessa evidence, signoff, and deferrals.

Acceptance criteria for closing MC-4000

MC-4000 should not close until all of these are true:

[ ] Full pytest suite collects; remaining failures, if any, are documented and explicitly accepted or ticketed.
[ ] Runtime-independence acceptance checks pass:
[ ] low-risk runtime switch preserves ticket history and old runtime evidence;
[ ] low-risk Hermes-through-MC session path works or fails visibly with correct runtime failure classification.
[ ] MC-4049 campaign-owner/controller separation is landed, reviewed, and documented.
[ ] Core pages have status rows in this PRD with evidence links: Home, Board, Workbench, Tasks, Ticket detail, Settings/runtime profiles, Console/raw escape hatch.
[ ] Tasks/Scheduler evidence remains green after latest master/live state.
[ ] Workbench has a fresh desktop/mobile browser pass after verification health is restored.
[ ] Board and Home/chat shell complete the page delivery loop or have explicit child tickets with accepted deferral reasons.
[ ] Evidence index exists for MC-4000.
[ ] Lucienne can truthfully say: implementation was done by workers, independently reviewed, fixed after review, browser-tested where relevant, signed off by system/product gate, and personally inspected.

Decision recommendation

Yes: update the original plan, but do not turn the original plan into the live tracker. Keep the original as the historical architecture plan and add a pointer/status addendum to this PRD/status ledger.

This PRD/status ledger should become the working controller artifact for MC-4000 until the recovery campaign closes.