{"definition_raw":"---\nid: persistent-luci-watchdog\ntitle: Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder\nschedule: \"*/5 * * * *\"\ntimeout: 120\nretry: false\nenabled: true\nnotify_on: failure\nrun_as: shell\ncommand: \"python3 /home/lucienne/workspace/scripts/persistent_luci_watchdog.py\"\ntags: [infra, luci, persistent, watchdog, liveness]\nruntime_profile: direct_python\n---\n**OVERRIDES runtime profile:** uses `direct_python` (plain Python, no model) because the command chain never invokes the `claude` CLI or any LLM API \u2014 pure-infra task; scheduler provider env injection is a no-op (MC-4942 U12 sweep).\n\nRuns every 5 minutes. Two independent jobs each tick:\n\n**1. Liveness keep-alive (MC-3763).** Checks `persistent_luci.status()`. If the\n`luci-persistent` orchestrator session is dead, restarts it via\n`ensure_started()` and Telegram-alerts. A 2-strike grace (restart only on the\n2nd consecutive offline observation) avoids fighting the daily 03:00 rotation\nwhile the fresh session boots. The strike counter is persisted to\n`/tmp/luci-persistent-keepalive.json` so `systemd_watchdog.py` corroborates\norchestrator health hourly from the same source. Closes the post-OOM /\npost-reboot dead window from ~30 min (lazy heal) to ~5\u201310 min.\n\n**2. Context escalation ladder.** Captures the `luci-persistent` tmux pane,\nparses the token counter from the Claude Code TUI footer, and escalates:\n\n- **>=700k tokens** (soft) \u2014 sends `/compact` so old turns are summarised\n  (defers if the pane is busy).\n- **>=820k tokens** (critical) \u2014 sends `/compact` with a forced interrupt\n  (does not defer \u2014 waiting risks overflowing the hard threshold).\n- **>=900k tokens** (hard) \u2014 `clear_persistent_luci()`: sends `/clear` and then\n  re-injects the continuity snapshot as the fresh session's first message, so\n  orchestrator working state survives the context wipe (MC-3766).\n\nCooldown is 5 minutes by default but shortens to 2 minutes above the soft\nthreshold so a fast climb gets enough attempts to escalate. Logs every decision\nto `~/workspace/logs/persistent-luci-watchdog.log`.\n\nImplemented for MC-2364 after Persistent Luci hit 201.2k tokens on 2026-04-28\nand stopped responding cleanly. Liveness keep-alive added for MC-3763 after the\norchestrator-robustness eval found the session had no real liveness supervisor.\n","id":"persistent-luci-watchdog","last_run":{"duration_s":0.303497,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413870.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T06:10:00.999674+02:00","status":"completed"},"next_run":"2026-06-13 06:15","next_run_iso":"2026-06-13T06:15:00+02:00","runs":[{"duration_s":0.303497,"finished_at":"2026-06-13T06:10:01.306238+02:00","id":413870,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413870.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T06:10:00.999674+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.288791,"finished_at":"2026-06-13T06:05:01.349826+02:00","id":413858,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413858.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T06:05:01.057763+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.277095,"finished_at":"2026-06-13T06:00:50.335061+02:00","id":413841,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413841.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T06:00:50.054924+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.28931,"finished_at":"2026-06-13T05:55:01.319261+02:00","id":413817,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413817.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:55:01.027420+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.260931,"finished_at":"2026-06-13T05:50:01.221802+02:00","id":413805,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413805.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:50:00.957481+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.294497,"finished_at":"2026-06-13T05:45:04.897903+02:00","id":413792,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413792.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:45:04.600187+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.268044,"finished_at":"2026-06-13T05:40:01.234067+02:00","id":413778,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413778.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:40:00.964114+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.278233,"finished_at":"2026-06-13T05:35:01.336585+02:00","id":413766,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413766.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:35:01.055653+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.277029,"finished_at":"2026-06-13T05:30:26.406004+02:00","id":413751,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413751.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:30:26.126324+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.254738,"finished_at":"2026-06-13T05:25:01.323532+02:00","id":413733,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413733.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:25:01.065421+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.291876,"finished_at":"2026-06-13T05:20:01.307343+02:00","id":413721,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413721.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:20:01.012819+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.298952,"finished_at":"2026-06-13T05:15:04.966250+02:00","id":413708,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413708.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:15:04.664381+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.297558,"finished_at":"2026-06-13T05:10:01.316636+02:00","id":413694,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413694.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:10:01.016651+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.28224,"finished_at":"2026-06-13T05:06:05.546892+02:00","id":413684,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413684.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:06:05.261855+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.290916,"finished_at":"2026-06-13T05:00:58.234091+02:00","id":413671,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413671.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T05:00:57.939913+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.251908,"finished_at":"2026-06-13T04:55:01.248767+02:00","id":413647,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413647.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T04:55:00.994212+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.288489,"finished_at":"2026-06-13T04:50:01.237411+02:00","id":413635,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413635.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T04:50:00.945893+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.276443,"finished_at":"2026-06-13T04:45:05.066672+02:00","id":413622,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413622.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T04:45:04.788097+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.274427,"finished_at":"2026-06-13T04:40:01.431998+02:00","id":413608,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413608.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T04:40:01.155087+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"},{"duration_s":0.293109,"finished_at":"2026-06-13T04:35:01.253932+02:00","id":413595,"log_path":"/home/lucienne/workspace/logs/task-runs/persistent-luci-watchdog/413595.log","output":"luci-persistent OK\n{\"keepalive\": {\"ok\": true, \"action\": \"noop\", \"reason\": \"online\", \"online\": true}, \"watchdog\": {\"ok\": true, \"action\": \"noop\", \"tokens\": 0, \"reason\": \"below_thresholds\"}}\n","started_at":"2026-06-13T04:35:00.958633+02:00","status":"completed","task_id":"persistent-luci-watchdog","task_name":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"}],"runs_limit":20,"schedule":"*/5 * * * *","schedule_label":{"description":"Every 5 minutes","is_custom":false,"label":"Every 5 min","sort":1,"sort_time":""},"stats":{"avg_duration":0.3318181230769231,"completed":2015,"failed":0,"timeout":0,"total":2015},"task":{"_description":"**OVERRIDES runtime profile:** uses `direct_python` (plain Python, no model) because the command chain never invokes the `claude` CLI or any LLM API \u2014 pure-infra task; scheduler provider env injection is a no-op (MC-4942 U12 sweep).\n\nRuns every 5 minutes. Two independent jobs each tick:\n\n**1. Liveness keep-alive (MC-3763).** Checks `persistent_luci.status()`. If the\n`luci-persistent` orchestrator session is dead, restarts it via\n`ensure_started()` and Telegram-alerts. A 2-strike grace (restart only on the\n2nd consecutive offline observation) avoids fighting the daily 03:00 rotation\nwhile the fresh session boots. The strike counter is persisted to\n`/tmp/luci-persistent-keepalive.json` so `systemd_watchdog.py` corroborates\norchestrator health hourly from the same source. Closes the post-OOM /\npost-reboot dead window from ~30 min (lazy heal) to ~5\u201310 min.\n\n**2. Context escalation ladder.** Captures the `luci-persistent` tmux pane,\nparses the token counter from the Claude Code TUI footer, and escalates:\n\n- **>=700k tokens** (soft) \u2014 sends `/compact` so old turns are summarised\n  (defers if the pane is busy).\n- **>=820k tokens** (critical) \u2014 sends `/compact` with a forced interrupt\n  (does not defer \u2014 waiting risks overflowing the hard threshold).\n- **>=900k tokens** (hard) \u2014 `clear_persistent_luci()`: sends `/clear` and then\n  re-injects the continuity snapshot as the fresh session's first message, so\n  orchestrator working state survives the context wipe (MC-3766).\n\nCooldown is 5 minutes by default but shortens to 2 minutes above the soft\nthreshold so a fast climb gets enough attempts to escalate. Logs every decision\nto `~/workspace/logs/persistent-luci-watchdog.log`.\n\nImplemented for MC-2364 after Persistent Luci hit 201.2k tokens on 2026-04-28\nand stopped responding cleanly. Liveness keep-alive added for MC-3763 after the\norchestrator-robustness eval found the session had no real liveness supervisor.","_file":"persistent-luci-watchdog.md","_path":"/home/lucienne/workspace/tasks/persistent-luci-watchdog.md","command":"python3 /home/lucienne/workspace/scripts/persistent_luci_watchdog.py","enabled":true,"id":"persistent-luci-watchdog","notify_on":"failure","retry":false,"run_as":"shell","runtime_profile":"direct_python","schedule":"*/5 * * * *","tags":["infra","luci","persistent","watchdog","liveness"],"timeout":120,"title":"Persistent Luci watchdog \u2014 liveness keep-alive + context escalation ladder"}}
