{"definition_raw":"---\nid: scheduler-watchdog\ntitle: Scheduler watchdog \u2014 observability for scheduled tasks\nschedule: \"0 * * * *\"\ntimeout: 300\nretry: false\nenabled: true\nnotify_on: failure\nrun_as: shell\ncommand: \". ~/.claude/env/api_keys.env && python3 /home/lucienne/workspace/scripts/scheduler_watchdog.py\"\ntags: [infra, observability, watchdog]\nruntime_profile: direct_python\n---\n**OVERRIDES runtime profile:** uses `direct_python` (plain Python, no model) because the command chain never invokes the `claude` CLI or any LLM API \u2014 pure-infra task; scheduler provider env injection is a no-op (MC-4942 U12 sweep).\n\nHourly watchdog that detects silent scheduler failures and creates MC tickets\nfor Luci to investigate.\n\nFour checks, each produces dedup'd MC tickets (stable identifier embedded in\ntitle prefix `[watchdog:<check>:<key>]`):\n\n- **A. YAML parse** \u2014 scans scheduler.log for `WARNING: Skipping <file> \u2014 bad frontmatter`.\n  Catches race conditions where the scheduler trips on a task file while syncs\n  are writing it.\n- **B. Missed fires** \u2014 for every enabled task, computes the previous expected\n  fire time via croniter. If `task_runs` has no entry within a 15-min grace\n  window past that time, flags it. Applies to ALL cadences (daily, weekly, monthly).\n- **C. Execution failures** \u2014 task_runs rows with status `failed` or `timeout`\n  in the last hour.\n- **D. Alert channel** \u2014 counts `Telegram send failed` entries in recent\n  scheduler.log tail.\n\nImplementation: `~/workspace/scripts/scheduler_watchdog.py`.\nDeployed 2026-04-19 after uncovering a mass scheduler drop (15 tasks stopped\nfiring together Apr 16-17) that had been invisible because the scheduler's\nown alerting was also broken. See MC-957, MC-958, MC-959, MC-960.\n","id":"scheduler-watchdog","last_run":{"duration_s":0.725591,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/413843.log","output":"","started_at":"2026-06-13T06:01:07.295503+02:00","status":"completed"},"next_run":"2026-06-13 07:00","next_run_iso":"2026-06-13T07:00:00+02:00","runs":[{"duration_s":0.725591,"finished_at":"2026-06-13T06:01:08.024018+02:00","id":413843,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/413843.log","output":"","started_at":"2026-06-13T06:01:07.295503+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.680023,"finished_at":"2026-06-13T05:04:30.254812+02:00","id":413675,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/413675.log","output":"","started_at":"2026-06-13T05:04:29.572861+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.663553,"finished_at":"2026-06-13T04:04:41.326200+02:00","id":413499,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/413499.log","output":"","started_at":"2026-06-13T04:04:40.659359+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.832104,"finished_at":"2026-06-13T03:00:38.360355+02:00","id":413315,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/413315.log","output":"","started_at":"2026-06-13T03:00:37.525056+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.8439,"finished_at":"2026-06-13T02:02:24.553574+02:00","id":413141,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/413141.log","output":"","started_at":"2026-06-13T02:02:23.706086+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.848895,"finished_at":"2026-06-13T01:00:41.645332+02:00","id":412965,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/412965.log","output":"","started_at":"2026-06-13T01:00:40.793952+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":1.004721,"finished_at":"2026-06-13T00:00:50.668410+02:00","id":412789,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/412789.log","output":"","started_at":"2026-06-13T00:00:49.662236+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.858788,"finished_at":"2026-06-12T23:00:39.579403+02:00","id":412613,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/412613.log","output":"","started_at":"2026-06-12T23:00:38.715691+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.922909,"finished_at":"2026-06-12T22:00:36.055704+02:00","id":412439,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/412439.log","output":"","started_at":"2026-06-12T22:00:35.129586+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.828276,"finished_at":"2026-06-12T21:00:34.418554+02:00","id":412266,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/412266.log","output":"","started_at":"2026-06-12T21:00:33.587657+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.933098,"finished_at":"2026-06-12T20:00:32.474745+02:00","id":412091,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/412091.log","output":"","started_at":"2026-06-12T20:00:31.539257+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.814005,"finished_at":"2026-06-12T19:04:06.874919+02:00","id":411923,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/411923.log","output":"","started_at":"2026-06-12T19:04:06.058531+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.801873,"finished_at":"2026-06-12T18:01:21.330504+02:00","id":411748,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/411748.log","output":"","started_at":"2026-06-12T18:01:20.526521+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.768636,"finished_at":"2026-06-12T17:03:12.291186+02:00","id":411575,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/411575.log","output":"","started_at":"2026-06-12T17:03:11.520337+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.798236,"finished_at":"2026-06-12T16:02:05.718879+02:00","id":411402,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/411402.log","output":"","started_at":"2026-06-12T16:02:04.917774+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.804021,"finished_at":"2026-06-12T15:00:32.805578+02:00","id":411227,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/411227.log","output":"","started_at":"2026-06-12T15:00:31.998934+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.950998,"finished_at":"2026-06-12T14:00:46.550800+02:00","id":411054,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/411054.log","output":"","started_at":"2026-06-12T14:00:45.597339+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.960199,"finished_at":"2026-06-12T13:00:50.157434+02:00","id":410880,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/410880.log","output":"","started_at":"2026-06-12T13:00:49.194037+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.794759,"finished_at":"2026-06-12T12:09:17.275548+02:00","id":410722,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/410722.log","output":"","started_at":"2026-06-12T12:09:16.478778+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"},{"duration_s":0.848487,"finished_at":"2026-06-12T11:00:28.243569+02:00","id":410544,"log_path":"/home/lucienne/workspace/logs/task-runs/scheduler-watchdog/410544.log","output":"","started_at":"2026-06-12T11:00:27.392629+02:00","status":"completed","task_id":"scheduler-watchdog","task_name":"Scheduler watchdog \u2014 observability for scheduled tasks"}],"runs_limit":20,"schedule":"0 * * * *","schedule_label":{"description":"Every hour at :00","is_custom":false,"label":"Hourly","sort":2,"sort_time":""},"stats":{"avg_duration":0.9752231142857142,"completed":175,"failed":0,"timeout":0,"total":175},"task":{"_description":"**OVERRIDES runtime profile:** uses `direct_python` (plain Python, no model) because the command chain never invokes the `claude` CLI or any LLM API \u2014 pure-infra task; scheduler provider env injection is a no-op (MC-4942 U12 sweep).\n\nHourly watchdog that detects silent scheduler failures and creates MC tickets\nfor Luci to investigate.\n\nFour checks, each produces dedup'd MC tickets (stable identifier embedded in\ntitle prefix `[watchdog:<check>:<key>]`):\n\n- **A. YAML parse** \u2014 scans scheduler.log for `WARNING: Skipping <file> \u2014 bad frontmatter`.\n  Catches race conditions where the scheduler trips on a task file while syncs\n  are writing it.\n- **B. Missed fires** \u2014 for every enabled task, computes the previous expected\n  fire time via croniter. If `task_runs` has no entry within a 15-min grace\n  window past that time, flags it. Applies to ALL cadences (daily, weekly, monthly).\n- **C. Execution failures** \u2014 task_runs rows with status `failed` or `timeout`\n  in the last hour.\n- **D. Alert channel** \u2014 counts `Telegram send failed` entries in recent\n  scheduler.log tail.\n\nImplementation: `~/workspace/scripts/scheduler_watchdog.py`.\nDeployed 2026-04-19 after uncovering a mass scheduler drop (15 tasks stopped\nfiring together Apr 16-17) that had been invisible because the scheduler's\nown alerting was also broken. See MC-957, MC-958, MC-959, MC-960.","_file":"scheduler-watchdog.md","_path":"/home/lucienne/workspace/tasks/scheduler-watchdog.md","command":". ~/.claude/env/api_keys.env && python3 /home/lucienne/workspace/scripts/scheduler_watchdog.py","enabled":true,"id":"scheduler-watchdog","notify_on":"failure","retry":false,"run_as":"shell","runtime_profile":"direct_python","schedule":"0 * * * *","tags":["infra","observability","watchdog"],"timeout":300,"title":"Scheduler watchdog \u2014 observability for scheduled tasks"}}
