Alive, but over-layered. The MC orchestration scheduled tasks are mostly firing and the core path is functioning, but the control plane has accumulated overlapping minute loops, watchdogs, cleanup jobs, and meta-tuning tasks. The simplification target should be: one minute-level intake/dispatch loop, one 15-minute operator loop, one external Hermes watchdog, and a small set of true cleanup/service monitors.
Evidence gathered read-only:
ticket-pickup, needs-input-pickup, triage-untriaged, persistent-luci-watchdog, queue-reaper, reap-zombie-workers, cron-watchdog, scheduler-watchdog, systemd-watchdog, and luci-operator all have fresh successful runs.luci-persistent tmux session exists and the watchdog reports it online.persistent-luci-branch-guard reports master.MC control-plane watchdog (quiet, 15m) is enabled and running every 15m; latest archived run was silent, while latest state still records one human-action issue: MC-3896 awaiting Elmar.HTTP Error 409 dispatch warnings around MC-4323, now parked as blocked after repeated runtime-send failures. That is not a scheduler-wide outage, but it proves the system still sees dispatch/runtimes churn.MC control-plane watchdog (quiet, 15m)~/.hermes/cron/state; Telegrams only when non-empty output is needed.every 15m, last status ok, latest report says MC API 200, ccgram.service active, luci-dashboard.service active, core tasks fresh, one human-action wait (MC-3896).luci-operatorVerdict: Keep, but narrow the role. It should be the 15-minute decision-maker/gatekeeper, not also another dispatcher/reaper/scheduler watchdog. Treat other jobs as sensors/actuators feeding it.
luci-operator-tuner
logs/luci_operator.jsonl, scores operator behavior, writes a report, files an MC ticket, and can ask a Claude subagent for patch suggestions.MC-4333 / Operator tune — 2026-05-28, assigned to Elmar.ticket-pickupneeds_input and in_review to orchestrator inbox, drains orchestrator inbox through luci-persistent, triages inbox mentions, dispatches Luci tickets, and dispatches Larry tickets.HTTP Error 409 dispatch failures for MC-4323 and related tickets; latest run had no tickets at that moment.Verdict: Keep as the single minute-level intake/dispatch loop. It already does most of the fast-path work.
needs-input-pickup
needs_input. Code comments say this was fixed to be “truly needs-input-only” under the same single-flight pickup lock, avoiding the old double-dispatch race.ticket-pickup cadence.Verdict: Candidate to retire or convert to event-driven. Its original body says normal pickup is every 15m, but normal pickup now runs every minute. If ticket-pickup can handle requeued replies fast enough, this task is redundant.
triage-untriaged
processed 0 ticket(s).queue-reaper0 expired, 0 reclaimed, 0 failed, 0 cleaned.Verdict: Keep. It owns a distinct queue-table concern.
reap-zombie-workers
mc-MC-* sessions whose tickets are terminal (done/cancelled) after grace period. It is a safety net for inline cleanup.ticket-pickup already calls reap_terminal_state_tmux_sessions() every minute for sessions whose ticket left todo/in_progress.Verdict: Candidate to merge/reduce. Keep as a belt-and-braces independent fallback only if pickup/runtimes remain fragile; otherwise reduce to hourly or retire after pickup reaping is trusted.
worktree-reaper
Verdict: Keep. Distinct low-frequency maintenance.
mc-orchestrator-inbox-cleanup
orchestrator_inbox as processed after 24h. Body says insert-time auto-expiry now exists, so this is belt-and-braces legacy cleanup.persistent-luci-watchdogluci-persistent liveness, restarts after 2 consecutive offline observations, and manages context compaction/clear ladder.luci-persistent OK, online, tokens below thresholds.Verdict: Keep. This is core to the orchestrator control surface.
persistent-luci-branch-guard
master; alerts on drift, does not autoswitch.branch-guard: ok (master).Verdict: Merge into persistent-luci-watchdog. It is tiny and related to the same persistent-session health domain.
rotate-luci-session
orchestrator-board-sweepin_review and needs_input into orchestrator inbox; releases waiting tickets when blockers recovered.luci-operator also audits stale ticket state; ticket-pickup also triages stale needs_input/in_review into inbox.Verdict: Candidate to fold into luci-operator. If the operator is the 15-minute board decision-maker, hourly board sweep can become a subroutine rather than a separate task.
scheduler-watchdog
cron-watchdog also monitor task freshness/overdue state.Verdict: Keep, but make it the only MC-internal scheduler-ticket creator. It is useful because it creates durable MC tickets; avoid duplicate Telegram alerts from other scheduler monitors.
cron-watchdog
/api/v1/scheduler/inspect and Telegram-alerts on overdue tasks.scheduler-watchdog checks missed fires/failures and creates MC tickets; Hermes external watchdog checks core task freshness outside the scheduler.Verdict: Candidate to retire. A scheduler task watching the scheduler is weaker than Hermes external cron, and it duplicates scheduler-watchdog. Keep only if it covers non-core overdue alerts not covered elsewhere.
systemd-watchdog
Verdict: Keep. Distinct service-layer monitor. Could feed its results to the external Hermes watchdog eventually, but not urgent.
pickup-watchdog
enabled: false, disabled_reason: retired.ticket-pickup, needs-input-pickup, triage-untriaged.ticket-pickup, orchestrator-board-sweep, and luci-operator.ticket-pickup has reaping; reap-zombie-workers also reaps.scheduler-watchdog, and cron-watchdog all inspect task freshness/failures.persistent-luci-watchdog, persistent-luci-branch-guard, rotate-luci-session, plus systemd-watchdog all touch parts of persistent Luci health.luci-operator-tuner filed an Elmar ticket.MC control-plane watchdog (quiet, 15m).Purpose: outside-scheduler truth source; quiet on health; alert only genuine human/action issues.
Single minute intake/dispatch
ticket-pickup.Fold into it or trigger from it:
needs-input-pickuptriage-untriagedneeds_input/in_review routingSingle 15-minute operator
luci-operator.Should not compete with the minute dispatcher.
Cleanup lane
queue-reaper, worktree-reaper.Decide whether reap-zombie-workers remains independent or becomes less frequent.
Service/session lane
persistent-luci-watchdog, rotate-luci-session, systemd-watchdog.Merge: persistent-luci-branch-guard into persistent watchdog.
Scheduler-ticket creator
scheduler-watchdog for durable MC tickets.cron-watchdog, unless it has unique overdue coverage.needs-input-pickup. Normal ticket-pickup already runs every minute and uses the same pickup lock.triage-untriaged into the single intake loop or slow it to every 5 minutes. It usually processes zero.persistent-luci-branch-guard into persistent-luci-watchdog. Same domain, tiny check.cron-watchdog after confirming Hermes external watchdog + scheduler-watchdog cover the same overdue conditions. Avoid duplicate scheduler monitors.orchestrator-board-sweep into luci-operator or keep only as a subroutine. Same board-hygiene domain.reap-zombie-workers to hourly or make it emergency fallback only. Pickup already does minute-level session reaping.luci-operator-tuner from nightly Elmar-facing tickets to weekly/anomaly-triggered Luci-owned reports. This cuts meta-noise.mc-orchestrator-inbox-cleanup if insert-time auto-expiry is confirmed. It is legacy direct SQL.ticket-pickup — every minute; single intake/dispatch/requeue/triage-light loop.luci-operator — every 15m; high-agency board/runtime/scheduler decision pass.MC control-plane watchdog — every 15m; outside-scheduler sentinel.scheduler-watchdog — hourly; durable MC tickets for scheduler faults.persistent-luci-watchdog — every 5m; liveness/context/branch health.rotate-luci-session — daily.queue-reaper — every 15m.worktree-reaper — weekly.systemd-watchdog — hourly.Everything else should either be retired, merged, or downgraded to lower cadence.
MC-3896 only in the external watchdog report.