You're offline — showing cached data

MC-4606

Bound aggregate self-heal retry chain for ANY task (not just wiki-compile)
2026-06-13 07:34:31 SAST
Home Board MC-4606

Bound aggregate self-heal retry chain for ANY task (not just wiki-compile)

Forward hardening surfaced during MC-4603. The 5326s overrun root cause was the self-heal CHAIN: attempt-1 fail → claude diagnose → rerun, all timed from one start; each execute...
State Done Next Action Closed Owner Luci Runtime Closed Age 11d ago
MC-4606
Ticket is done; runtime is closed. · profile claude_opus_1m_medium

Description

MC-4606
Forward hardening surfaced during MC-4603. The 5326s overrun root cause was the self-heal CHAIN: attempt-1 fail → claude diagnose → rerun, all timed from one start; each execute_command is bounded by its 3600s timeout but the AGGREGATE retry/heal chain is not. MC-4603 fixed this for wiki-project-compile specifically (self_heal:false + detach). But any OTHER task with self_heal:true + a long timeout can chain-overrun the same way and (if foreground) starve the serial scheduler. Fix: in scheduler.py, bound the total self-heal chain wall-clock per task (e.g. chain budget = timeout * (1 + max_heal_attempts), or a single absolute deadline shared across attempt+diagnose+rerun). Add a regression that simulates a 3-attempt heal chain and asserts it's killed at the aggregate budget, not 3x the per-call timeout. Not urgent — wiki-compile (the only task that hit this) is already detached. This generalizes the guard.

Activity

done
Luci is working...
Live
No activity yet
Help