You're offline — showing cached data

MC-4357

Harvest DB-write race strands completed tickets in needs_input (signal-miss recurrence)
2026-06-13 08:50:14 SAST
Home Board MC-4357

Harvest DB-write race strands completed tickets in needs_input (signal-miss recurrence)

ROOT CAUSE for recurring signal-harvester-miss incidents (operator-cleaned 2026-05-28 ~03:18 and again ~06:2x). SYMPTOM: Under concurrent worker bursts (tonight: 6-worker auto-...
State Done Next Action Closed Owner Luci Runtime Closed Age 16d ago
MC-4357
Ticket is done; runtime is closed. · profile codex · cwd /home/lucienne/workspace/mission-control · uptime 16d 4h · last activity 15d 15h ago

Description

MC-4357
ROOT CAUSE for recurring signal-harvester-miss incidents (operator-cleaned 2026-05-28 ~03:18 and again ~06:2x). SYMPTOM: Under concurrent worker bursts (tonight: 6-worker auto-review cascade MC-4348..4353), workers finish and emit {mc_signal:DONE} to panes, but tickets strand in needs_input / bounce done->in_progress instead of settling at done. Operator manually advanced MC-4337/4346/4349/4351/4352/4353/4354. EVIDENCE: - Durable signal files state/mc-signals/MC-<id>.json contain signal=DONE for stranded tickets (harvest SAW sentinel) yet ticket DB status never advanced -> the final UPDATE failed, not the read. - logs/mc_pickup.log: 'database is locked' in restart_trace setup, orchestrator drain skipped, orchestrator_drain ok:false. - mc.db journal_mode=wal (good) but harvest/update connections appear to lack busy_timeout -> instant lock-fail under contention. - MC-4353: after manual advance to done it bounced to in_progress because its idle post-DONE runtime was re-dispatched. SUGGESTED FIX (dev-loop Tier 2): 1. All harvest/status-update DB connections in mc-orchestrator-build-mc/ticket_runtime.py set PRAGMA busy_timeout>=30000 + retry-on-locked. 2. Idempotent reconcile: each harvest pass, if a durable DONE/REVIEW signal exists for a non-terminal ticket, re-apply it -> single lost write self-heals next cycle. 3. Retire idle post-DONE runtimes promptly so completed tickets stop bouncing. Confirm live harvest path (mc-orchestrator-build-mc/ticket_runtime.py vs mission-control/) before editing. Add regression test for concurrent UPDATE + lock.

Activity

done
Luci is working...
Live
No activity yet
Help