Phase 1: simplify MC orchestration scheduled tasks safely
Implement Phase 1 of MC orchestration scheduled-task simplification, after read-only review + second opinion. Context reports: - http://100.118.207.3:3001/md-view?file=/home/lu...
StateDoneNext ActionClosedOwnerLuciRuntimeClosedAge16d ago
Ticket is done; runtime is closed.·profile claude_opus_1m_high · cwd /home/lucienne/workspace/mission-control · uptime 15d 21h · last activity 15d 19h ago
Description
MC-4381
Implement Phase 1 of MC orchestration scheduled-task simplification, after read-only review + second opinion.
Context reports:
- http://100.118.207.3:3001/md-view?file=/home/lucienne/workspace/reports/mc-orchestration-simplification-review-2026-05-28.md
- http://100.118.207.3:3001/md-view?file=/home/lucienne/workspace/reports/mc-orchestration-simplification-second-opinion-2026-05-28.md
Scope: Phase 1 only, reversible and low-risk.
Required changes:
1. Slow `triage-untriaged` from every minute to every 5 minutes. Preserve existing Gemini/deterministic assignment behavior unchanged.
2. Change `luci-operator-tuner` from nightly Elmar-facing noise to weekly or anomaly-triggered Luci-owned behavior. Prefer weekly first unless implementing anomaly trigger is trivial and well tested. It must not create routine Elmar-facing tickets unless a genuine product/policy/human decision is required.
3. Reduce `reap-zombie-workers` from every 10 minutes to hourly. Do not retire it yet.
4. For `mc-orchestrator-inbox-cleanup`, first verify stale pending low scheduler rows older than 24h are zero. If yes, disable/demote it with explicit disabled_reason or low cadence. Do not delete. If not zero, leave it enabled and report why.
5. Do NOT retire `needs-input-pickup` in this phase. Instead update/prepare references only if needed so phase 2 can canary-disable it safely later. Do not break current pickup path.
Do not change in Phase 1:
- `cron-watchdog`
- `orchestrator-board-sweep`
- `persistent-luci-branch-guard`
- `persistent-luci-watchdog`
- `ticket-pickup`
- Hermes external cron watchdog
Process requirements:
- Use the mandatory single-session dev-loop before editing source/task files.
- Keep changes minimal and reversible.
- Commit and push if tests/validation pass.
- Add a concise ticket comment with what changed and what was intentionally left for phase 2.
Validation/acceptance:
- `python3 scheduler.py list` or equivalent task parser check succeeds.
- Confirm changed task frontmatter parses and schedules are as intended.
- Query recent task definitions/API after change and verify no accidental disable of core pickup/operator/watchdog.
- Run/directly verify any touched scripts if logic changed.
- Verify `ticket-pickup`, `luci-operator`, `persistent-luci-watchdog`, and Hermes external watchdog remain enabled/fresh.
- If disabling/demoting inbox cleanup, include the stale-row count used to justify it.
Expected check-in: 2026-05-28T15:00:00+02:00
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· High
· Luci
▼
SState
Done
Closed
PPeople
TTiming / Details▼
luci (system)
Mission Control
16d ago
16d ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.ticket_marked_doneprofile claude_opus_1m_high · cwd /home/lucienne/workspace/mission-control · uptime 15d 21h · last activity 15d 19h agoMC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.Raw console: luci · claude/anthropic/opus[1m]
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
Phase 1 landed (commit on master, pushed). Changes — all reversible, config + one small script-logic edit:
1. triage-untriaged: `* * * * *` → `*/5 * * * *`. Gemini/deterministic assignment behavior unchanged (frontmatter-only).
2. reap-zombie-workers: `*/10 * * * *` → `0 * * * *` (hourly). Not retired. Inline update_ticket reaping + minute-level ticket-pickup reaping cover the fast path; this stays as slow fallback.
3. luci-operator-tuner: nightly `30 1 * * *` → weekly `30 1 * * 1` (Mon 01:30 UTC). Now Luci-owned: scripts/luci_operator_tuner.py no longer files an Elmar-facing MC ticket + Telegram per run; routine weekly report is surfaced to orchestrator_inbox (priority `normal`, source `scheduler`) so it lands in Luci's pickup digest. Luci opens a real ticket only if a finding needs a genuine product/policy/human decision. Report file still written. Chose weekly over anomaly-trigger (simpler, well-understood, low-risk).
4. mc-orchestrator-inbox-cleanup: verified stale pending low/scheduler rows >24h = **0** on 2026-05-28 → demoted daily → weekly (`15 4 * * 0`). Kept enabled as a rare backstop, NOT deleted/disabled, so a regression in the insert-time auto-expiry path still gets caught. Restore-to-daily instructions in the task body.
5. needs-input-pickup: left fully unchanged this phase (do-not-break). Phase-2 canary-disable still needs operator/scheduler-watchdog/Hermes references made disabled-aware first.
Validation:
- `python3 scheduler.py list` parses all changed tasks; schedules read as intended.
- Core path confirmed enabled/fresh: ticket-pickup, luci-operator, needs-input-pickup, persistent-luci-watchdog (all enabled=yes); Hermes external watchdog state file fresh (last check 13:36, untouched).
- luci_operator_tuner.py compiles; verified pickup inbox-drain delivers a `normal`-priority scheduler row into the digest (low-only batches are routine-skipped, so `normal` is the correct choice to ensure Luci sees the report).
- Committed only the 5 in-scope files; unrelated working-tree changes left untouched.
Left for Phase 2 (per second opinion): needs-input-pickup retirement/canary, cron-watchdog demote/retire, orchestrator-board-sweep fold into luci-operator, persistent-luci-branch-guard merge into persistent-luci-watchdog.
Rollback: revert the commit, or restore each schedule (`* * * * *`, `*/10 * * * *`, `30 1 * * *`, `15 4 * * *`).
Shadow reviewer verdict: FAIL (advisory only). Ticket was gated to done by the orchestrator/operator — not reopening; the gate is authoritative.
Reasons: The committed diff (done_sha=eb4de427) contains only a single-line change to tests/conftest.py adding a stub for extract_tool_results. There is no diff touching any task frontmatter, scheduler configuration, or script logic for triage-untriaged, reap-zombie-workers, luci-operator-tuner, mc-orchestrator-inbox-cleanup, or luci_operator_tuner.py. The worker's comments describe extensive changes (cron schedule modifications, script logic edits) but the actual diff shows none of these. Additionally, the dirty_tree=1 flag indicates uncommitted changes, meaning the claimed work is not fully committed. No test evidence for the described changes exists in the diff either.
Gaps:
- No committed changes to triage-untriaged schedule (*/5 * * * *)
- No committed changes to reap-zombie-workers schedule (hourly)
- No committed changes to luci-operator-tuner schedule or script logic
- No committed changes to mc-orchestrator-inbox-cleanup schedule or stale-row verification
- No committed changes to luci_operator_tuner.py removing Elmar-facing ticket filing
- No test evidence exercising any of the described schedule or logic changes
- dirty_tree=1 indicates work not fully committed
- No evidence of stale pending low scheduler rows count for inbox-cleanup decision
luci16d ago
Post-close controller review requested by Elmar: PASS with minor caution — keep 64926ae3, do not revert.
I independently reviewed the actual workspace commit and asked a separate reviewer to inspect it too.
Findings:
- Commit 64926ae3 touches only the intended Phase 1 files: scripts/luci_operator_tuner.py plus the four task files.
- Phase 1 scope matched: triage-untriaged 1min→5min; reap-zombie-workers 10min→hourly; luci-operator-tuner nightly→weekly and no routine Elmar ticket/Telegram; mc-orchestrator-inbox-cleanup daily→weekly after stale low scheduler rows verified 0.
- Phase 2/prohibited items were not touched: needs-input-pickup, ticket-pickup, cron-watchdog, orchestrator-board-sweep, persistent-luci-*, Hermes external watchdog.
- Scheduler parser confirms intended next runs; py_compile passes for luci_operator_tuner.py; stale pending low scheduler rows >24h remains 0.
Minor caution: the tuner is now weekly but still analyzes a 24h recent / 72h extended window. That is not unsafe, but if we want a true weekly review we should later decide whether to expand the lookback to 168h. This is follow-up polish, not a reason to revert.
Conclusion: MC-4381 implementation itself is sound. The failure was the MC review/orchestration gate misreading the wrong repo/sha and appending a false FAIL after closure (tracked on MC-4371/MC-4357).
luci16d ago
Controller triage after Elmar challenged the closure: the confusing failure comment is real evidence of a review-gate/control-plane bug, not evidence that MC-4381's Phase 1 work failed.
Manual verification now:
- Actual owning repo for this ticket was /home/lucienne/workspace, not /home/lucienne/workspace/mission-control.
- origin/master is at 64926ae3f95717605c1546083cb343a4468d4ad7.
- Commit 64926ae3 touches exactly the intended 5 files: scripts/luci_operator_tuner.py, tasks/luci-operator-tuner.md, tasks/mc-orchestrator-inbox-cleanup.md, tasks/reap-zombie-workers.md, tasks/triage-untriaged.md.
- Live task files currently show the intended schedules: triage */5, reap hourly, operator-tuner weekly, inbox-cleanup weekly.
What failed: the shadow reviewer looked at the mission-control runtime cwd/slot branch done_sha (eb4de42, tests/conftest.py) instead of the workspace commit that actually landed the scheduled-task changes, then appended an advisory FAIL after the controller had already closed the ticket. That is exactly the MC-4371 class of cross-repo/done_sha review-gate bug.
Action taken: normalized this ticket's done_sha to the real workspace commit 64926ae3 and marked dirty flag clear for this ticket's committed artifact. Existing unrelated workspace dirt remains unrelated to MC-4381.