You're offline — showing cached data

MC-4831

Fix MC operator/pickup reliability — outside review findings
2026-06-13 07:37:27 SAST
Home Board MC-4831

Fix MC operator/pickup reliability — outside review findings

Based on external Hermes review (2026-05-24): /home/lucienne/workspace/reports/mc-operator-outside-review-2026-05-24.md Control plane is alive but not cleanly trustworthy. Prio...
State Done Next Action Closed Owner Luci Runtime Closed Age 5d ago
MC-4831
Ticket is done; runtime is closed. · profile claude_opus_1m_medium

Description

MC-4831
Based on external Hermes review (2026-05-24): /home/lucienne/workspace/reports/mc-operator-outside-review-2026-05-24.md Control plane is alive but not cleanly trustworthy. Priority fixes: 1. **Single-flight pickup lock** — `needs-input-pickup` + `ticket-pickup` both run every minute and compete. Add global lock or merge into one dispatcher. 2. **Make needs-input-pickup truly needs-input-only** — currently it calls `dispatch()` and handles all todo tickets, not just requeued needs_input. Either fix or remove. 3. **Fix MC-4122 policy mismatch** — pickup sees `lucienne`-assigned tickets as runnable, but claim policy rejects them. Logs misleadingly say "already claimed." Fix assignee/campaign-owner guard and logging. 4. **Fix worker-count fail-open** — `active_workers_by_db()` returns 0 on failure, meaning "no workers active" → dispatch proceeds with full slots. Should fail closed or return MAX_WORKERS. 5. **Orchestrator inbox needs durable action proof** — inbox items marked `processed` after delivery to luci-persistent, not after per-ticket decision/action. Split states: pending → delivered → acted. Require structured result before done. 6. **Move semantic gates out of operator** — operator directly changes ticket status to done/in_review based on keywords. Should create orchestrator inbox items instead, not directly close work tickets. 7. **Structured completion proof** — audit still uses broad keywords (verified/fixed/implemented). Require mc-coord signals, attempt id, commit hash, test/deploy evidence. 8. **Operator idempotency** — operator tickets can stay suppressed 4h after done while condition persists. Add recurrence tracking or reopen prior ticket. 9. **Stop operator from self-modifying tasks** — `luci_operator.py` re-enables `ticket-pickup` when backlog exists. Add explicit maintenance policy flag. 10. **Centralize ticket mutations** — operator + pickup do direct SQLite writes bypassing API lifecycle. Centralize status transitions through CAS-semantic API calls. Full findings + evidence in report. Acceptance: - One pickup owner with lock - needs-input-only handles only needs_input requeues - Policy-skip logging distinct from claim conflict - Worker counting fails closed - Inbox items only marked processed after durable action - Operator observes/alerts, orchestrator decides semantic gates - Completion requires structured evidence

Activity

done
Luci is working...
Live
No activity yet
Help