Repair Luci Control Room active pickup loop so dev backlog is dispatched, not only alerted
Root cause from 2026-06-03 Telegram escalation: legacy ticket-pickup/needs-input-pickup were correctly disabled after duplicate-spawn regressions, but the replacement Luci Contr...
StateDoneNext ActionClosedOwnerLuciRuntimeClosedAge10d ago
Ticket is done; runtime is closed.·cwd /home/lucienne/workspace/state/control-room-worktrees/mc-4631-orchestration-repair · uptime 10d 1h · last activity 9d 23h ago
Description
MC-4631
Root cause from 2026-06-03 Telegram escalation: legacy ticket-pickup/needs-input-pickup were correctly disabled after duplicate-spawn regressions, but the replacement Luci Control Room watcher is no_agent:true and alert-only. It detects new high-priority items/stuck workers but does not actively dispatch medium/low inbox/todo backlog into runtimes. Acceptance: implement/verify event-driven controller handoff or deterministic single-ticket dispatcher that moves Luci-owned dev tickets from inbox/todo into worker runtimes with MC runtime ledger, respects duplicate-launch guards, stays silent when no work, and provides evidence via a controlled smoke ticket.
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· High
· Luci
▼
SState
Done
Closed
PPeople
TTiming / Details▼
telegram-escalation (human)
Mission Control
10d ago
10d ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.controller_gate_closedcwd /home/lucienne/workspace/state/control-room-worktrees/mc-4631-orchestration-repair · uptime 10d 1h · last activity 9d 23h agoMC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.Raw console: luci-controller · claude-code
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
RUNTIME TERMINAL STATE (MC-3482 contract)
status: warning
summary: Live ticket runtime timed out without a harvestable verdict; parked for automatic runtime recovery.
root_cause: harvest timeout — pane was silent/wedged and the transcript carried no verdict to sweep
safe_retry: Runtime was killed and the ticket was returned to todo for a fresh Luci-owned retry; no human input is needed.
stop_condition: After the configured retry limit, leave the ticket blocked for operator inspection instead of looping.
human_input_required: no
next_actions:
- Pickup will start a fresh runtime from MC history.
- Inspect the pane log artifact only if the retry fails again.
artifacts:
- ticket:MC-4631
system10d ago
RUNTIME TERMINAL STATE (MC-3482 contract)
status: warning
summary: Live ticket runtime timed out without a harvestable verdict; parked for automatic runtime recovery.
root_cause: harvest timeout — pane was silent/wedged and the transcript carried no verdict to sweep
safe_retry: Runtime was killed and the ticket was returned to todo for a fresh Luci-owned retry; no human input is needed.
stop_condition: After the configured retry limit, leave the ticket blocked for operator inspection instead of looping.
human_input_required: no
next_actions:
- Pickup will start a fresh runtime from MC history.
- Inspect the pane log artifact only if the retry fails again.
artifacts:
- ticket:MC-4631
api10d ago
MC-4631 delivered on branch cr/mc-4631-orchestration-repair (commit 96deed3, pushed). scripts/control_room_pickup.py: deterministic no_agent loop that reconciles stranded/finished direct workers (recovers MC-4612 class), promotes actionable inbox->todo, and dispatches exactly one bounded detached worker — replacing the alert-only watcher. 5-model council applied (atomic CAS; tmux-dies-on-exit wedge fix; shlex-quoted cmd + identifier validation; tee -a header survival; reconcile grace window; con.rollback; flock; phase isolation; prompt DoS cap). 40 tests pass. Live proof: dry-run selects exactly one (MC-4632) and skips active/blocked/human-wait; reconcile-only recovered genuinely-stranded MC-4637 (in_progress->todo). Wired via deployed shim ~/.hermes/scripts/luci_ticket_watcher.py; loop activates on merge to master. Legacy pickup stays disabled. Kill-switches CR_PICKUP_DISABLE / CR_PICKUP_NO_DISPATCH.
luci-followup-sweep10d ago
[follow-up] 27 min past expected check-in (2026-06-03T08:35:46.418039+02:00 UTC).
luci-controller10d ago
Controller escalated this as the P0 orchestration breakage: active watcher is alert-only, legacy pickup disabled, tickets not moving.
Dispatched direct Coder worker to build the active pickup/recovery loop.
Runtime: tmux cr-MC-4631, pid 2722013, worktree /home/lucienne/workspace/state/control-room-worktrees/mc-4631-orchestration-repair.
Expected check-in: 2026-06-03T08:35:46.418039+02:00
Completion gate: branch+tests+smoke proof; no legacy pickup re-enable.
luci-controller10d ago
Additional root cause from MC-4637 false needs_input: routing_log shows generic/manual_api transitions, not worker output. 07:53 status todo→in_progress with no runtime/session/tmux; generic PATCH only updates DB and does not launch a worker. 08:05 status in_progress→needs_input with failure_reason 'No live worker or interactive session is attached to this ticket.' No comment/runtime existed. This is a technical no-start state misclassified as a human gate. Fix must reject or auto-normalize in_progress/needs_input without runtime evidence and route eligible Luci tickets through active dispatcher instead.
luci-controller10d ago
Controller landed MC-4631 active orchestration loop.
Evidence:
- Rebased branch cr/mc-4631-orchestration-repair onto current master after MC-4612.
- Tests: `python3 -m pytest tests/test_control_room_pickup.py tests/test_mc4612_home_rail_close_reason.py tests/test_mc4123_home_v2.py -q` → 79 passed.
- Fast-forward merged commit 0a4f12a to master and pushed origin/master.
- Dry-run smoke: promotes eligible inbox tickets and selects exactly one dispatch target; skips active/blocked/human-wait; no mutations in dry-run.
- luci-dashboard.service restarted; local + Tailscale API smoke returned HTTP 200.
Note: luci-persistent remains only a dormant interactive console/keepalive, not the controller path. The active loop is the no_agent watcher + control_room_pickup.py.