⌂ Home ☷ Board

Operator Tuner Report — 2026-05-19

Window: last 24h (extended 72h for recurrence). Runs: 47. Total actions: 457. Avg actions/run: 8.66.

Action mix (recent window)

    47  run_start
    47  disk_worktree_snapshot
    47  memory_snapshot
    47  done_audit_summary
    47  run_complete
    32  blocked_lane_classification_summary
    31  recent_task_failure
    30  repo_dirty_observed
    26  operator_ticket_exists
    15  operator_dev_loop_throttled
    15  terminal_ticket_runtime_closed
    12  promote_dead_zone_ticket
    11  active_lane_snapshot
     9  todo_backlog_observed
     6  repo_dirty_all_known_generated
     6  operator_dev_loop_finished
     6  create_operator_ticket
     5  attach_operator_context_to_closed_failure_ticket
     3  reopen_weak_done_ticket
     3  skipped_overlap
     2  memory_pressure_observed
     2  blocked_runtime_failure_reset
     1  operator_ticket_resolved
     1  stale_runtime_observed
     1  stale_runtime_marked
     1  stale_runtime_summary
     1  breakglass_noop_healthy
     1  pickup_direct_run
     1  operator_dev_loop_failed
     1  blocked_lane_completed_marked_done

Signals worth tuning

Reopened-ticket outcomes (current MC state)

Proposed patches

Operator Tuner Review — 2026-05-19

1. Non-critical task failures inflate dev-loop trigger count

Problem: recent_task_failure (31 events) is in trigger_actions, so every failed task run — including non-critical ones like swing-trader scans — attempts to launch a dev-loop. Most are throttled (15 of 21 attempts = 0.71 ratio), producing throttle-log noise with no operational gain. Critical failures are already handled by _ensure_failure_ticket + watchdog.

Location: maybe_launch_operator_dev_loop, trigger_actions set (~line defining the set).

Change: Remove "recent_task_failure" from trigger_actions, or add a post-filter so only critical-task failures trigger:

triggers = [a for a in self.actions
            if a.get("action") in trigger_actions
            and not (a.get("action") == "recent_task_failure"
                     and a.get("task_id") not in CRITICAL_TASK_IDS)]

Expected effect: Throttle events drop from ~15 to ~3-5 per 24h. Dev-loop still fires for critical-task failures. Average actions/run decreases slightly.


2. ACTIVE_LANE_STALE_HOURS_TRIGGER of 2 h is too tight

Problem: [operator:active-lane-backlog] recurs 4× in 72h. The dev-loop timeout alone is 3600 s (~1 h), and operator tickets regularly run 1-2 h. A 2-hour stale threshold flags healthy in-progress work as "backlog not draining," creating false escalations.

Location: Constant ACTIVE_LANE_STALE_HOURS_TRIGGER = 2.

Change: Raise to 3.

Expected effect: Active-lane-backlog recurrence drops from ~4 to ~1-2 per 72h. Tickets legitimately running 2-3 h are no longer escalated.


3. Dev-loop throttle persists on non-zero subprocess exit

Problem: _operator_dev_loop_throttled writes the 6-hour throttle timestamp before the subprocess runs. If subprocess.run returns a non-zero exit code, execution falls through to operator_dev_loop_finished normally — no exception, so the error-handler that expires the throttle never fires. A failed dev-loop is then blocked for a full 6 hours.

Location: _launch_operator_dev_loop, immediately after the subprocess.run call and the operator_dev_loop_finished record.

Change: If result.returncode != 0, expire the throttle so the next run can retry:

if result.returncode != 0:
    expired = (self.options.now - timedelta(hours=6)).isoformat()
    state_path.write_text(
        json.dumps({"last_started_at": expired}), encoding="utf-8")

Expected effect: The 1 failed dev-loop in this window would have been eligible for retry on the next operator tick (~30 min) instead of waiting 6 h. Successful runs keep the full 6-hour throttle.


4. Signals acknowledged but too thin to act on