Date: 2026-05-31
Reviewer: Lucienne (architectural review)
Status: DECISION REQUIRED — Recommendation provided
Recommend Option 1: Commit fully to the Control Room model.
The hybrid state is already causing confusion, burning tokens, and creating operational drag. The data shows the old runtime model is effectively dormant for Luci (0 active runtime sessions, 0 open Luci tickets in runtime, mc_pickup.py not dispatching Luci tickets), yet the infrastructure still exists, creates maintenance burden, and complicates the mental model. Larry's work can be folded into the Control Room dispatch path without losing capability.
The decisive factor: Elmar designed the Control Room model as the intended architecture. The only reason to keep the hybrid is fear of migration cost — but the migration is smaller than it appears because the runtime is already largely inactive for the main orchestrator path.
| Metric | Value | Interpretation |
|---|---|---|
| Active runtime_sessions | 0 | No live ticket workers running |
| runtime_sessions (completed) | 177,290 | Historical record, not active infrastructure |
| runtime_sessions (failed/stale) | 770 + 180 | Cleanup debt, not active work |
| Luci open tickets in MC | 3 (MC-3930 waiting, MC-4464 todo, MC-4193 waiting) | All manageable by Control Room watcher |
| Larry open tickets in MC | 0 | No active Larry runtime work currently |
| mc_pickup.py | Exists, 7,971 lines, 120 functions | Heavy legacy codebase |
| Luci Control Room watcher | d237c9eb2a7c, enabled, */5 min, no_agent script |
Active control room path |
| Old mc-board-shepherd-5min | 7fa17b6a8bad, disabled since 2026-05-29 |
Old runtime path already shut down |
| MC control-plane watchdog | b35ca4611b00, enabled, every 15m, no_agent |
Healthy infrastructure monitoring |
| Iris jobs | 8 enabled cron jobs, all no_agent or LLM-driven | Iris already operates on notification/event model |
| tmux sessions | mc-root: 2 windows |
Minimal residual runtime footprint |
Key insight: The system is already mostly running in Control Room mode for Luci. The old mc-board-shepherd-5min (the full-LLM runtime orchestrator) was disabled on 2026-05-29. The only remaining runtime activity is residual tmux sessions and the mc_pickup.py codebase itself.
Why this is correct:
- Aligns with Elmar's explicit design intent
- The system is already ~90% migrated — finish the job
- Eliminates the confusing dual-path mental model
- Removes 7,971 lines of legacy dispatcher code (mc_pickup.py)
- Lets the orchestrator (Luci) own all routing decisions
- Enables consistent behavior across Luci, Iris, and Lucienne
Why the cons are manageable:
- "Lose existing runtime infrastructure" — the infrastructure is already dormant. The 177K completed sessions are historical data, not active capability.
- "Need to rebuild worker dispatch" — Luci's Control Room watcher already spawns external workers (Codex CLI, Claude Code, subagents). The dispatch capability exists; it's just not going through mc_pickup.py.
- Larry's coding tasks can be dispatched the same way: Luci creates a ticket, spawns a worker in a tmux session or SSH session, and tracks it via comments/status.
Why this is wrong: - The current "hybrid" is actually just a messy transition state, not a designed architecture - Every ticket would need a routing decision: "does this go through Control Room or runtime?" This creates confusion and drift - Two status-change paths mean tickets can get stuck in gaps between systems - The cost issue (5-min polling) isn't solved — it's just split across two systems - No clear boundary exists today; creating one is harder than finishing the migration
Why this is wrong: - Directly contradicts Elmar's design intent - Loses orchestrator intelligence that has been built and documented (AGENTS.md, delegation control plane, multi-model review) - Re-introduces the exact problem Elmar wanted to solve: workers talking directly to Elmar, no single coordination layer - The Control Room model is working; reverting would be a step backward
mc_pickup.py dispatch, or has he already moved to Control Room handoff?If Larry still uses runtime: create a runbook for Luci to spawn Larry workers via SSH/tmux directly (this already exists in AGENTS.md: "SSH into Larry's host from inside your dev-loop session")
Disable residual runtime triggers
ticket-pickup.md and needs-input-pickup.md task files remain disabled (already done per pulse JSON)Set a config flag MC_CONTROL_ROOM_MODE=true in MC env to prevent accidental re-enablement
Clean up stale runtime_sessions
runtime_sessions rows older than 30 days (keep completed for analytics, but move to archive table)mc_pickup.pymc_pickup.py to _deprecated/ (don't delete yet — keep for 30 days)The file is 312KB and 7,971 lines — removing it significantly reduces codebase complexity
Unify agent watcher model
d237c9eb2a7c (Control Room watcher, no_agent script) — keep as-isStandardize on: no_agent script for polling/health checks, LLM invocation only when actionable work is found
Solve the noise/cost issue
d237c9eb2a7c) is a no_agent script ("no_agent": true) — this is already the right approachmc-board-shepherd-5min was the expensive full-LLM-every-5-min job — correctly disabledNOTIFY or lightweight webhook on ticket insert/updateFallback: keep no_agent cron at reduced frequency (e.g., every 15m) for resilience
Document the unified model
AGENTS.md and runbooks to reflect: all work routes through orchestrator, no direct runtime sessions_deprecated/mc_pickup.py after 30-day quarantineruntime_sessions table or archive to cold storageRemove tmux session management code from MC if no longer needed
Validate cost reduction
mc_pickup.py| Artifact | Current State | Action |
|---|---|---|
mc_pickup.py |
7,971 lines, not dispatching Luci tickets | Move to _deprecated/, retire in 30 days |
runtime_sessions table |
178K rows, 0 active | Archive old rows, drop or keep for history |
tmux mc-root |
2 windows | Inspect contents, kill if idle, document if active |
| Old ticket-pickup task files | Disabled (per pulse JSON) | Keep disabled, add config guard |
mc-board-shepherd-5min cron |
Disabled since 2026-05-29 | Delete after migration confirmed stable |
Larry's coding tasks: Larry already works via SSH from Luci's dev-loop session (per AGENTS.md). The Control Room model doesn't change this — it just means Luci explicitly spawns the session and tracks it via MC comments, rather than mc_pickup.py auto-dispatching. This is actually more controlled, not less.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Larry workflow disruption | Medium | High | Verify Larry's current dispatch path before retiring mc_pickup.py; have Luci manually dispatch first few tickets |
| Ticket gets missed without auto-pickup | Low | High | no_agent watcher still polls every 5 min; adds comment/status on finding actionable tickets |
| Iris jobs incompatible with Control Room | Low | Medium | Iris already operates on different schedule/frequency; no change needed |
| Webhook layer fails, no polling fallback | Low | High | Keep no_agent cron at 15m interval as safety net until webhooks proven |
| Historical runtime data loss | Low | Low | Archive before dropping; 178K rows are mostly completed/failed sessions |
| Reversion pressure if issues arise | Medium | Medium | Keep deprecated code for 30 days; document rollback procedure |
Commit to Option 1: Full Control Room model.
The system is already mostly there. The migration is finishing work, not starting over. The cost savings, architectural clarity, and alignment with Elmar's intent make this the obvious choice.
Next action: Elmar to approve this recommendation, then execute Phase 1 immediately (zero downtime).