You're offline — showing cached data

MC-4304

Fix operator conflict + dead-worker retry + kill-switch
2026-06-13 08:50:04 SAST
Home Board MC-4304

Fix operator conflict + dead-worker retry + kill-switch

# MC-4293: Fix operator conflict + add dead-worker retry **Priority:** high **Assigned:** luci **Depends on:** nothing (can start immediately, in parallel with MC-4290) ## Wha...
State Done Next Action Closed Owner Luci Runtime Closed Age 17d ago
MC-4304
Ticket is done; runtime is closed. · profile claude_opus_1m_medium · cwd /home/lucienne/workspace/mission-control · uptime 16d 18h · last activity 16d 16h ago

Description

MC-4304
# MC-4293: Fix operator conflict + add dead-worker retry **Priority:** high **Assigned:** luci **Depends on:** nothing (can start immediately, in parallel with MC-4290) ## What to do Two problems: (1) `luci_operator.py` re-opens "done" tickets independently of the Controller, fighting the review loop. (2) When a Worker dies mid-task, nothing retries — the ticket just sits at `todo` with a "Worker died unexpectedly" comment. ## Steps ### Part A: Stop the operator from re-opening tickets 1. In `luci_operator.py`, find the function that re-opens "done" tickets (likely `reopen_weak_completions` or similar). 2. Add a check: if the ticket has a `shadow_reviews` row with `verdict='pass'` for its `done_sha`, do NOT re-open. The reviewer already approved it. 3. Add a check: if `review_cycles` >= 1 (the ticket went through the review loop), do NOT re-open. The Controller already handled it. 4. Keep the operator's health recording — it should still observe and log, just not move tickets that the Controller has already judged. ### Part B: Add dead-worker retry 1. In `mc_pickup.py` or wherever the "Worker died unexpectedly" message is generated, add retry logic: - If this is the FIRST death for this ticket (check `review_cycles` or add a `death_count` field), set status back to `todo` and re-dispatch - If this is the SECOND death, set status to `needs_input`, add comment "Worker died twice. Needs Elmar to review.", escalate to Elmar - Reset `death_count` when a ticket successfully completes 2. Add a `death_count INTEGER DEFAULT 0` field to the tickets table (or use an existing comment-counting approach). 3. Make sure the retry doesn't fight with MC-4291's review loop: - If the Worker died AND there's a QA fail verdict, the QA fail takes precedence (send back with feedback) - If the Worker died with no verdict, retry once silently ### Part C: Make the kill-switch work 1. In `mc_orchestrator_flags.py`, verify `killswitch_active()` works correctly 2. Wire it into `mc_pickup.py`: if kill-switch is active, skip all auto-dispatch (no new worker pickups, no operator actions, no review loop actions). Only triage and manual actions should work. 3. Wire it into `luci_operator.py`: if kill-switch is active, skip all ticket-moving actions. 4. **Commit and push.** ## Acceptance criteria - Operator does not re-open tickets that have a passing QA reviewer verdict - Dead workers retry once automatically - Dead workers twice → escalate to Elmar - Kill-switch stops all auto-behaviour when engaged - Operator still records health metrics (passive observer) ## If blocked - If `luci_operator.py` is too large/complex to modify safely, comment out the re-open logic entirely and add a TODO for a cleaner refactor. The important thing is to stop the fighting. - If adding DB fields requires a migration and migrations are complex, use a JSON field or a separate tracking table instead. - Test the kill-switch: `touch /home/lucienne/workspace/mission-control/.mc_killswitch` and verify auto-dispatch stops. ## What NOT to do - Do not change the shadow reviewer (MC-4290) - Do not change the review loop (MC-4291) - Do not touch Tessa (MC-4292) - Do not clean the inbox (MC-4294)

Activity

done
Luci is working...
Live
No activity yet
Help