Fix ticket-pickup 409 row-drift self-healing after live runtime spawn
Fresh MC workflow audit found a remaining runtime-control issue: ticket-pickup can return HTTP 409 after a live mc-MC-XXXX tmux/runtime has actually been created, leaving the ti...
StateDoneNext ActionClosedOwnerLuciRuntimeClosedAge15d ago
Ticket is done; runtime is closed.·profile claude_opus_1m_high · cwd /home/lucienne/workspace/mission-control/.claude/worktrees/pool-2 · uptime 15d 3h · last activity 15d 1h ago
Description
MC-4411
Fresh MC workflow audit found a remaining runtime-control issue: ticket-pickup can return HTTP 409 after a live mc-MC-XXXX tmux/runtime has actually been created, leaving the ticket reverted to todo or stale in_progress while runtime_sessions/tmux disagree. I manually repaired MC-4402 by normalizing ticket row + runtime pid after pickup returned 409.
Evidence from 2026-05-29 07:22 SAST:
- mc_pickup.py dispatch for MC-4402 exited 75 with HTTP Error 409: CONFLICT.
- A live mc-MC-4402 tmux pane existed immediately afterwards.
- runtime_sessions had ticket:4402:session but stale/contradictory fields until controller repair.
- Similar warnings hit MC-4406/MC-4323/MC-4410 earlier this morning.
Deliverable:
- Identify why dispatch/send reports 409 after spawn or while runtime is busy.
- Make pickup/recovery self-heal row drift: if live tmux + runtime exists, normalize ticket status/worker_pid/runtime status instead of reverting to todo or looping warnings.
- Add/adjust narrow tests for the 409-after-spawn/live-runtime case.
- Verify with a no-op/smoke ticket or controlled dry-run that ticket row, runtime_sessions, and tmux stay consistent.
Acceptance:
- No repeated ticket-pickup warnings for 409 row drift in two consecutive pickup cycles after fix.
- A live runtime cannot be left hidden behind todo/stale in_progress state.
Expected check-in: 2026-05-29T08:09:46+02:00
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· High
· Luci
▼
SState
Done
Closed
PPeople
TTiming / Details▼
luci_controller (system)
Mission Control
15d ago
15d ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.ticket_runtime_doneprofile claude_opus_1m_high · cwd /home/lucienne/workspace/mission-control/.claude/worktrees/pool-2 · uptime 15d 3h · last activity 15d 1h agoMC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.Raw console: luci · claude/anthropic/opus[1m]
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
Created from fresh MC workflow audit. This is the remaining systemic workflow issue after the UI/role-flow fixes: pickup/runtime row drift around HTTP 409 after a live runtime exists.
luci15d ago
Luci picking up this ticket (background worker).
luci15d ago
Ticket picked up by Luci via MC dispatcher.
MC-4411: Fix ticket-pickup 409 row-drift self-healing after live runtime spawn
Work this ticket in the live tmux runtime. Use DONE:, REVIEW:, or QUESTION: when you need MC to reflect the next state.
luci15d ago
Ticket runtime send failed while dispatching.
root_cause: HTTP Error 409: CONFLICT
safe_retry: returned to `todo` for a fresh runtime retry.
human_input_required: no
system15d ago
RUNTIME TERMINAL STATE (MC-3482 contract)
status: warning
summary: Ticket runtime send failed; parked for automatic recovery.
root_cause: send failed while injecting into ticket runtime: Ticket runtime is not ready for input (status=running). Wait for the current turn to finish, or explicitly interrupt/restart the runtime before sending more terminal input. This message was not sent.
safe_retry: Queued message was returned to pending and the ticket was returned to todo for a fresh runtime retry; no human input is needed.
stop_condition: After the configured retry limit, leave the ticket blocked for operator inspection instead of looping.
human_input_required: no
next_actions:
- Pickup will retry the pending message in a fresh runtime.
- Inspect runtime/send only if the retry fails again.
artifacts:
- ticket:MC-4411
luci15d ago
Luci picking up this ticket (background worker).
Luci controller15d ago
Controller normalized its own reproduction: pickup returned 409 while creating live mc-MC-4411. Ticket/runtime restored to in_progress/running with worker_pid=126269. Use this live repro as evidence for the fix.
Luci15d ago
Ticket-pickup now self-heals 409 row-drift: adopts the live runtime (in_progress + live worker_pid, PID-alive gated) instead of reverting to todo, plus a proactive reconcile pass. mc_pickup.py + test, council CLEAN, 18 tests pass, both repos pushed.
luci15d ago
Workflow action `review_ready` applied. Work ready for review.
api15d ago
Gated + signed off. Verified: fix 9b5d6fd2 (workspace) + test c7d6cc1 (mission-control). _normalize_ticket_to_live_runtime adopts the live runtime on a post-spawn 409 instead of reverting to todo — breaking the todo<->409 re-claim loop. Liveness gated on the RECORDED Claude PID being alive (not just the tmux pane), so ghosts aren't resurrected; declines + lets TTL retire when PID dead. Guards confirmed: never regresses a ticket past in_progress (in_review/done), never steals from a different live worker, declines if it can't confirm ticket state. tests/test_mc4411_row_drift_selfheal.py — 8 passed locally. Council CLEAN per worker. Closing.
luci15d ago
Shadow reviewer verdict: UNCERTAIN (advisory only). Ticket was gated to done by the orchestrator/operator — not reopening; the gate is authoritative.
Reasons: The diff adds the core self-healing logic (_normalize_ticket_to_live_runtime, _reconcile_runtime_row_drift, and the 409 catch block in _send_ticket_to_tmux_runtime) with proper PID-liveness gating and guards against regressing tickets past in_progress. However, the diff is truncated mid-line in the 409 catch block, so the full integration cannot be verified. The worker claims 8 tests pass in tests/test_mc4411_row_drift_selfheal.py and a separate test commit c7d6cc1 in mission-control, but neither the test file nor the mission-control diff is included in the provided evidence, so test coverage cannot be confirmed. There is also no evidence of the required smoke/dry-run verification that ticket row, runtime_sessions, and tmux stay consistent.
Gaps:
- Test diff (tests/test_mc4411_row_drift_selfheal.py or mission-control commit c7d6cc1) not provided — cannot verify tests exist or cover the 409-after-spawn/live-runtime case
- Diff is truncated mid-line in the 409 catch handler — full integration code not inspectable
- No evidence of smoke/dry-run verification with a no-op ticket confirming consistency across ticket row, runtime_sessions, and tmux
- Cannot confirm the 409 catch block actually calls _normalize_ticket_to_live_runtime or handles the adoption correctly
luci-operator15d ago
Luci Operator cleared stale failure/runtime state because the ticket already has review-ready worker evidence.