⌂ Home ☷ Board

Operator Tuner Report — 2026-05-16

Window: last 24h (extended 72h for recurrence). Runs: 52. Total actions: 637. Avg actions/run: 11.17.

Action mix (recent window)

    75  operator_ticket_exists
    70  repo_dirty_observed
    52  run_start
    52  disk_worktree_snapshot
    52  memory_snapshot
    52  done_audit_summary
    52  run_complete
    49  blocked_lane_classification_summary
    33  promote_dead_zone_ticket
    25  recent_task_failure
    14  operator_dev_loop_finished
    13  active_lane_snapshot
    13  todo_backlog_observed
    13  terminal_ticket_runtime_closed
    12  active_lane_backlog_observed
    11  reopen_weak_done_ticket
    10  attach_operator_context_to_closed_failure_ticket
     7  operator_dev_loop_throttled
     5  create_operator_ticket
     4  breakglass_noop_healthy
     4  pickup_direct_run
     4  blocked_lane_completed_marked_done
     3  repo_dirty_all_known_generated
     3  memory_pressure_observed
     2  create_task_failure_ticket
     2  needs_input_runtime_failure_reset
     1  operator_ticket_resolved
     1  board_triage
     1  fix_deployed
     1  memory_recovered
     1  needs_input_pickup_diagnosis

Signals worth tuning

Reopened-ticket outcomes (current MC state)

Proposed patches

Operator Tuning Recommendations

  1. Active-lane threshold is too low, creating recurring noise

  2. Problem: [operator:active-lane-backlog] recurs ≥3 times in 72h; the board likely carries 8–12 active tickets as normal throughput, triggering escalation nearly every run.

  3. Location: ACTIVE_LANE_COUNT_TRIGGER = 10
  4. Change: Set to 15.
  5. Expected effect: active_lane_backlog_observed count drops; operator ticket only opens when the board is genuinely congested. The ACTIVE_LANE_STALE_HOURS_TRIGGER = 2 still catches a single stuck ticket.

  6. Reopened runnable-assignee tickets land in in_review and get stuck

  7. Problem: _reopen_status sends non-scheduled-source RUNNABLE_ASSIGNEE tickets to in_review. MC-3512 has been sitting in in_review for ~14h with failure_reason="Weak completion evidence found by Luci Operator" — no mechanism re-picks it up.

  8. Location: _reopen_status method — the branch return "in_review" (final line of method).
  9. Change: Return "todo" for any ticket assigned to a RUNNABLE_ASSIGNEE, not just scheduled-source ones; fall through to "in_review" only for non-runnable assignees. The reopen cap (_steward_reopen_count >= 1) prevents loops.
  10. Expected effect: Reopened worker tickets re-enter the pickup queue instead of stalling in review. Eliminates the MC-3512 class of stuck tickets.

  11. PROGRESS: Done is not recognized as strong completion evidence

  12. Problem: STRONG_EVIDENCE_PATTERNS does not include the PROGRESS_DONE_MARKER pattern. A worker comment "PROGRESS: Done — added logging to the scheduler" contains no word from the current strong patterns (no fixed/resolved/implemented/updated/created/delivered/sent), so the ticket is flagged as weak and reopened.

  13. Location: STRONG_EVIDENCE_PATTERNS tuple.
  14. Change: Add re.compile(r"^\s*PROGRESS:\s*Done\b", re.I | re.M) to the tuple.
  15. Expected effect: Reduces false weak-done reopens for tickets where the worker explicitly marked progress as done. Should bring the 11 reopen count down by an estimated 2–4 per day.

  16. Thin signal — dev-loop throttle ratio (0.33) is acceptable by design

The 7 throttled / 14 finished ratio means non-critical triggers wait up to 4h for a dev-loop slot, while critical triggers bypass the throttle entirely. The dev-loop already launched 14 times in 24h (~1.7h average cadence). Reducing the throttle window would increase Opus spend for marginal latency gain. Skip.

  1. Thin signal — dirty-repo recurrence (70 observations, 3 generated-only)

dirty-repo:workspace and dirty-repo:mission-control each recur ≥3 times. The operator is correctly observing and ticketing; the issue is that nothing resolves the dirty state. The operator_ticket_exists count (75) confirms the ticket exists but the problem persists. This is a resolution gap, not a tuning gap — no code change recommended from this report alone. Skip.