Fix MC Retry-as-To-Do semantics and false needs_input recovery
Root cause observed on MC-4637 (2026-06-03): user clicked the ticket icon labeled "Retry as To Do" intending to move a false needs_input ticket back to todo / signal Luci should...
StateDoneNext ActionClosedOwnerLuciRuntimeClosedAge10d ago
Ticket is done; runtime is closed.·cwd /home/lucienne/workspace/state/control-room-worktrees/mc-4656-fix-mc-retry-as-to-do-semantics-and-fals-0b5cf6 · uptime 9d 19h · last activity 9d 14h ago
Description
MC-4656
Root cause observed on MC-4637 (2026-06-03): user clicked the ticket icon labeled "Retry as To Do" intending to move a false needs_input ticket back to todo / signal Luci should own it. The API path /api/v1/tickets/<id>/runtime/retry does more than status reset: it calls _queue_and_pickup(... immediate=True), creating a runtime (mc-MC-4637). This is currently code-intended, but the label/UX is misleading.
Also MC-4637 was stuck in needs_input even though it had no true QUESTION/human blocker. Routing log shows manual_api in_progress→needs_input with reason "No live worker or interactive session is attached to this ticket." The Control Room pickup loop treats needs_input as human_wait, so it will not auto-requeue false needs_input rows.
Acceptance:
- Split UI/action semantics: provide a pure "Move to To Do"/"Clear false needs_input" action that does not start a runtime, or relabel existing runtime retry as "Retry now / start runtime".
- Add deterministic recovery for needs_input rows whose reason is only "No live worker or interactive session is attached" and which have no QUESTION/human blocker/comment, moving them back to todo under the Control Room loop.
- Ensure Control Room watcher owns Luci pickup; avoid legacy immediate _queue_and_pickup surprises from a status-reset action.
- Regression test covering needs_input→todo without runtime creation and runtime-retry still creating a runtime only when explicitly requested.
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· Critical
· Luci
▼
SState
Done
Closed
PPeople
TTiming / Details▼
luci-controller (system)
Mission Control
10d ago
10d ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.false_needs_input_recoveredcwd /home/lucienne/workspace/state/control-room-worktrees/mc-4656-fix-mc-retry-as-to-do-semantics-and-fals-0b5cf6 · uptime 9d 19h · last activity 9d 14h agoMC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.Raw console: luci-controller · claude-code
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
[orchestrator-architecture-requirement] Elmar clarified the core issue: MC board actions must not independently decide to start runtimes. The intended model is: ticket events/messages/status nudges go to Luci/the orchestrator first; Luci decides whether to start a runtime, delegate to Larry/Tessa/Scott/Council, ask a question, or just requeue. A user clicking "Retry as To Do" may be a signal to Luci, not consent for MC to auto-spawn a runtime. Fix should remove hidden board→runtime side effects from generic retry/status actions, or make any explicit "Start runtime now" action clearly labeled and auditable as such. Control Room/Hermes watcher should be the decision boundary.
luci-controller10d ago
[orchestrator-architecture-requirement] Runtime/session lifecycle clarification from Elmar: do not close a ticket tmux/runtime merely because the worker emitted DONE/REVIEW or moved ticket to in_review. Keep the execution session available until Luci/orchestrator has reviewed and explicitly signed off or returned fixes, so context can be reused and the worker can be re-entered without spawning a fresh context/window and rereading everything. Desired lifecycle: todo/no runtime -> Luci decides start -> in_progress/live runtime -> worker done => in_review + runtime parked/idle/reattachable -> Luci signs off => close/archive runtime; or Luci returns fixes => reuse/reattach same runtime when practical, otherwise start new only with an auditable reason. Acceptance should distinguish "worker finished" from "orchestrator accepted/closed".
luci-controller10d ago
[orchestrator-priority] Promoted to critical: this ticket fixes the control-plane confusion Elmar just raised. It should be the next dispatch after the current active worker clears; board retry/status actions must stop bypassing Luci runtime decisions.
luci-controller10d ago
[orchestrator-queue] MC-4656 is now critical and is the next Control Room pickup candidate. It is currently held only by the active-worker guard while MC-4640 is running; if MC-4640 finishes or is recovered, pickup dry-run selects MC-4656 next.
luci-followup-sweep10d ago
[follow-up] 8 min past expected check-in (2026-06-03T11:56:00.086185+00:00 UTC).
luci10d ago
Luci picking up this ticket (background worker).
luci10d ago
Ticket picked up by Luci via MC dispatcher.
MC-4656: Fix MC Retry-as-To-Do semantics and false needs_input recovery
Work this ticket in the live tmux runtime. Use DONE:, REVIEW:, or QUESTION: when you need MC to reflect the next state.
luci10d ago
Pickup hit a busy/conflict response, but a live runtime already exists (mc-MC-4656).
detail: HTTP Error 409: CONFLICT
self_heal: normalized ticket → in_progress and kept the live runtime (no revert to todo).
human_input_required: no
api10d ago
Controller sign-off (ground-truth verified, NOT the false reviewer verdict): DONE_LANDED. Commit 66d8e08 on mission-control master — false needs_input recovery + split retry-as-todo semantics; new control_room_pickup.py + tests. Verified: pytest test_control_room_pickup + test_ticket_runtime = 128 passed. Closing.
Controller decision
luci-controller10d ago
[controller-verification] MC-4656 fixed/verified manually after board quarantine. Split retry semantics are present: plain Move to To Do posts /runtime/retry with no start_runtime and does not call _queue_and_pickup for Luci; explicit start_runtime:true still dispatches. Verification: focused retry tests 6 passed; broader runtime/control-room suite 214 passed, 13 subtests passed; control_room_pickup dry-run selected no dispatch with active_workers=0. Marking done.
luci-controller10d ago
[mobile-visual-gate] APPROVED mobile. Live 375px screenshot: /home/lucienne/workspace/reports/qa/mc-4656-mobile-375.png. Console: only SW registration log. Visual review: no overlapping controls, unusable layout, unreadable critical text, or obvious breakage; minor density only.
luci-controller10d ago
[control-room-dispatch] Control Room dispatched MC-4656 to a Claude Code worker.
Worktree: /home/lucienne/workspace/state/control-room-worktrees/mc-4656-fix-mc-retry-as-to-do-semantics-and-fals-0b5cf6
Branch: cr/mc-4656-fix-mc-retry-as-to-do-semantics-and-fals-0b5cf6
tmux: cr-MC-4656
Expected check-in: 2026-06-03T11:56:00.086185+00:00
luci-controller10d ago
[control-room-recover] MC-4656: recovered DONE from a finished direct worker whose tmux was gone. Ticket → in_review. Summary: MC-4656 shipped in conrelma/mission-control — branch feat/mc-4656-false-needs-input-recovery merged to master, SHA 66d8e08 (pushed); luci-workspace cr/ landing pointer 8647be26. control_room_pickup.py adds atomic, flap-bounded false-needs_input→todo recovery (no runtime started); /runtime/retry split so Luci defaults to pure status reset (start_runtime:true for immediate, Larry always immediate);
luci-controller10d ago
[orchestrator-review] Review failed: focused runtime regression suite currently has 5 failures after the MC-4656 merge (207 passed, 5 failed). Returned from in_review to todo for rework instead of pretending done.
luci-controller10d ago
[control-room-recover] MC-4656: cleared a false needs_input — it was parked in the human-wait lane only because no live worker/session was attached (reason: 'No live worker or interactive session is attached to this ticket.'), with no QUESTION or human blocker. Requeued to todo; the Control Room pickup loop now owns dispatch. No runtime was started by this reset.