You're offline — showing cached data

Wiki

03-scheduler/overview
2026-06-13 06:15:46 SAST
Wiki Home → 03-scheduler/overview

Scheduler Overview

The scheduler (~/workspace/scheduler.py) is Luci's task execution engine. It runs every minute via cron, evaluates which tasks are due based on cron expressions, and executes them with locking, retry, self-healing, and failure escalation.

How It Works

Tick Cycle

  1. Load all .md files from ~/workspace/tasks/
  2. Parse YAML frontmatter; reject files with missing id or schedule
  3. Duplicate ID check -- if two files share an ID, the scheduler exits immediately and sends a Telegram alert
  4. For each enabled task:
  5. Skip if locked (another instance still running)
  6. Kill stale locks (lock age > timeout + 60s)
  7. Check if due (cron expression vs last completed run)
  8. If due, acquire lock and run

YAML Task Format

Every task file starts with a YAML frontmatter block:

---
id: example-task            # Unique identifier (duplicate = hard crash)
title: Human-readable name
schedule: "0 6 * * 1-5"    # Cron expression (evaluated in SAST)
timeout: 300                # Max seconds before kill (default: 600)
retry: true                 # Simple retry on first failure (default: false)
enabled: true               # false = skip entirely (default: true)
disabled_reason: by_choice  # why disabled: auto_suspended | retired | paused | by_choice (set when enabled: false)
self_heal: true             # Allow Claude to diagnose and fix (default: true)
notify_on: failure          # failure | success | always | never (default: failure)
notify_to: home             # notify.py destination key: dm|home|work|mc|life-manager|general (optional; injected as LUCI_NOTIFY_DEST env)
run_as: shell               # shell | claude | script
command: "python3 foo.py"   # Shell command to execute
tags: [infra, backup]       # Categorization tags
---

Markdown body with human-readable description of what the task does.

Execution Model

Claude Command Isolation

The persistent Luci/Telegram session is the only process allowed to use the Telegram-enabled Claude configuration. Scheduler-owned Claude calls are guarded automatically:

If a task intentionally needs a different Claude configuration, make that explicit in the task definition and document why. Avoid sudo claude, remote ssh ... claude, or sh -c 'claude ...' in scheduler commands because those can bypass the bash function wrapper.

Working Directory Policy

The scheduler is machine-level infrastructure, so it defaults task commands to ~/workspace. Task definitions may override this in two ways:

Task setting Result
cwd: /some/path Run exactly from that path
cwd_policy: pka or pka_repo Run from ~/workspace/PKA
cwd_policy: mission-control, mission_control, or mc Run from ~/workspace/mission-control
No cwd setting Run from ~/workspace

Many legacy task commands still begin with an explicit cd ...; that remains valid and should be treated as the command's own local override. Ticket workers use a related project-based resolver documented in 02-mission-control/worker-system.

Comment-Driven Control

Before running a task, the scheduler checks 02-mission-control/overview|Mission Control for unread human comments on the task. Claude interprets the comments and returns one of:

This allows Elmar to pause or adjust tasks by commenting on them in the MC dashboard.

Self-Healing

When a task fails, the scheduler follows an escalation ladder:

  1. Attempt 1: Run the command
  2. Attempt 2: Simple retry (if retry: true)
  3. Attempt 3: Self-heal -- Claude diagnoses the error and edits the script/task
  4. Attempt 4: Second self-heal attempt with updated error context
  5. Suspension: Task is disabled (enabled: false), Telegram alert sent, MC ticket created

Self-Heal Guard Rails

Consecutive Failure Handling

For tasks with self_heal: false, the scheduler tracks consecutive failures:

Error Escalation

Event Action
Task fails once Retry (if enabled)
Retry fails Self-heal attempt 1
Heal 1 fails Self-heal attempt 2
Heal 2 fails Suspend task, Telegram alert (force, bypasses quiet hours), MC ticket
Task timeout Log as timeout, Telegram alert, MC ticket
Scheduler crash Telegram alert (force), MC ticket, "all tasks paused" warning
Duplicate task ID Hard exit with Telegram alert

Key Commands

python3 scheduler.py tick       # Cron calls this every minute
python3 scheduler.py run <id>   # Force-run a specific task (ignores schedule)
python3 scheduler.py list       # Show all tasks with schedule, enabled, last/next run
python3 scheduler.py history    # Show last 20 task runs from mc.db

File Locations

Path Purpose
~/workspace/scheduler.py Main scheduler code
~/workspace/tasks/*.md Task definitions
~/workspace/mission-control/mc.db Run history (task_runs table)
/tmp/luci-task-{id}.lock Per-task lock files
~/workspace/logs/self-heal-audit.log Heal attempt audit trail
~/workspace/.heal-state.json Probation tracking state
~/workspace/logs/fail-counts/{id}.count Consecutive failure counters
~/workspace/prompts/self-heal.txt Prompt template for Claude self-heal
~/workspace/prompts/check-comments.txt Prompt template for comment interpretation

Related

Key Takeaways

Help