Operator Tuner Report

Window: last 24h (extended 72h for recurrence). Runs: 52. Total actions: 637. Avg actions/run: 11.17.

Action mix (recent window)

    75  operator_ticket_exists
    70  repo_dirty_observed
    52  run_start
    52  disk_worktree_snapshot
    52  memory_snapshot
    52  done_audit_summary
    52  run_complete
    49  blocked_lane_classification_summary
    33  promote_dead_zone_ticket
    25  recent_task_failure
    14  operator_dev_loop_finished
    13  active_lane_snapshot
    13  todo_backlog_observed
    13  terminal_ticket_runtime_closed
    12  active_lane_backlog_observed
    11  reopen_weak_done_ticket
    10  attach_operator_context_to_closed_failure_ticket
     7  operator_dev_loop_throttled
     5  create_operator_ticket
     4  breakglass_noop_healthy
     4  pickup_direct_run
     4  blocked_lane_completed_marked_done
     3  repo_dirty_all_known_generated
     3  memory_pressure_observed
     2  create_task_failure_ticket
     2  needs_input_runtime_failure_reset
     1  operator_ticket_resolved
     1  board_triage
     1  fix_deployed
     1  memory_recovered
     1  needs_input_pickup_diagnosis

Signals worth tuning

Recurring operator-ticket markers (>=3 in 72h): {'[operator:dirty-repo:workspace]': 3, '[operator:active-lane-backlog]': 3, '[operator:dirty-repo:mission-control]': 3}
Weak-done reopens: 11
Stale-runtime marks (extended window): 4
Dev-loop throttle ratio: 0.33 (7 throttled / 14 finished)
Repo-dirty observations: 70 (generated-only noise: 3)

Reopened-ticket outcomes (current MC state)

MC-3450: status=done updated=2026-05-15T11:35:43.237560+02:00 failure=—
MC-3451: status=cancelled updated=2026-05-15T11:00:16.185239+02:00 failure=—
MC-3454: status=done updated=2026-05-15 08:34:11 failure=—
MC-3455: status=done updated=2026-05-15 08:34:11 failure=—
MC-3456: status=done updated=2026-05-15T08:30:20.401016+02:00 failure=—
MC-3457: status=done updated=2026-05-15 08:34:11 failure=—
MC-3458: status=done updated=2026-05-15 08:34:11 failure=—
MC-3461: status=done updated=2026-05-15T11:30:26.707253+02:00 failure=—
MC-3462: status=done updated=2026-05-15 07:25:00 failure=—
MC-3511: status=done updated=2026-05-15T21:21:53.786635+02:00 failure=—
MC-3512: status=in_review updated=2026-05-15T21:30:10.875359+02:00 failure=Weak completion evidence found by Luci Operator

Proposed patches

Operator Tuning Recommendations

Active-lane threshold is too low, creating recurring noise
Problem: [operator:active-lane-backlog] recurs ≥3 times in 72h; the board likely carries 8–12 active tickets as normal throughput, triggering escalation nearly every run.
Location: ACTIVE_LANE_COUNT_TRIGGER = 10
Change: Set to 15.
Expected effect: active_lane_backlog_observed count drops; operator ticket only opens when the board is genuinely congested. The ACTIVE_LANE_STALE_HOURS_TRIGGER = 2 still catches a single stuck ticket.
Reopened runnable-assignee tickets land in in_review and get stuck
Problem: _reopen_status sends non-scheduled-source RUNNABLE_ASSIGNEE tickets to in_review. MC-3512 has been sitting in in_review for ~14h with failure_reason="Weak completion evidence found by Luci Operator" — no mechanism re-picks it up.
Location: _reopen_status method — the branch return "in_review" (final line of method).
Change: Return "todo" for any ticket assigned to a RUNNABLE_ASSIGNEE, not just scheduled-source ones; fall through to "in_review" only for non-runnable assignees. The reopen cap (_steward_reopen_count >= 1) prevents loops.
Expected effect: Reopened worker tickets re-enter the pickup queue instead of stalling in review. Eliminates the MC-3512 class of stuck tickets.
PROGRESS: Done is not recognized as strong completion evidence
Problem: STRONG_EVIDENCE_PATTERNS does not include the PROGRESS_DONE_MARKER pattern. A worker comment "PROGRESS: Done — added logging to the scheduler" contains no word from the current strong patterns (no fixed/resolved/implemented/updated/created/delivered/sent), so the ticket is flagged as weak and reopened.
Location: STRONG_EVIDENCE_PATTERNS tuple.
Change: Add re.compile(r"^\s*PROGRESS:\s*Done\b", re.I | re.M) to the tuple.
Expected effect: Reduces false weak-done reopens for tickets where the worker explicitly marked progress as done. Should bring the 11 reopen count down by an estimated 2–4 per day.
Thin signal — dev-loop throttle ratio (0.33) is acceptable by design

The 7 throttled / 14 finished ratio means non-critical triggers wait up to 4h for a dev-loop slot, while critical triggers bypass the throttle entirely. The dev-loop already launched 14 times in 24h (~1.7h average cadence). Reducing the throttle window would increase Opus spend for marginal latency gain. Skip.

Thin signal — dirty-repo recurrence (70 observations, 3 generated-only)

dirty-repo:workspace and dirty-repo:mission-control each recur ≥3 times. The operator is correctly observing and ticketing; the issue is that nothing resolves the dirty state. The operator_ticket_exists count (75) confirms the ticket exists but the problem persists. This is a resolution gap, not a tuning gap — no code change recommended from this report alone. Skip.

Operator Tuner Report — 2026-05-16

Action mix (recent window)

Signals worth tuning

Reopened-ticket outcomes (current MC state)

Proposed patches

Operator Tuning Recommendations