⌂ Home ☷ Board

MC orchestration simplification — second opinion

Verdict

PASS with CAUTION. Proceed, but phase it and do not remove unique coverage until references/monitors are updated.

Key risks

Recommended first phase

  1. Slow triage-untriaged to every 5 minutes.
  2. Make luci-operator-tuner weekly or anomaly-triggered, Luci-owned unless genuinely human policy/product review is needed.
  3. Reduce reap-zombie-workers to hourly.
  4. Retire/demote mc-orchestrator-inbox-cleanup after confirming stale pending low scheduler rows are still zero.
  5. Canary-disable needs-input-pickup only after updating references in operator/watchdogs.

Leave for phase 2:

Rollback checks

Watch for 24–48h after each phase:

Rollback by restoring old task schedule/enabled state if backlogs, missed alerts, stale tickets, or orphaned tmux sessions grow.