You're offline — showing cached data

Wiki

02-mission-control/worker-system
2026-06-13 07:26:59 SAST
Wiki Home → 02-mission-control/worker-system

Worker System

Workers are background Claude Code sessions that pick up and execute MC tickets.

Architecture

Scheduler (every 1 min)
  └── mc_pickup.py
        ├── Fetch todo tickets assigned to Luci
        ├── Check available worker slots (MAX_WORKERS = 2)
        └── For each ticket:
              ├── Spawn: claude -p <prompt> --worktree mc-mc-<id> --effort max
              ├── Stream output via stream-json
              ├── Tee raw stream-json to mc-worker-<id>-stream.log (MC-747)
              ├── Parse DONE:/REVIEW:/QUESTION: prefix
              ├── Store tool_use/tool_result/text events in ticket_events table
              ├── Push ticket_event over SSE for live UI panel
              ├── Post result as comment on ticket
              └── Update ticket status

Key Files

Working Directory Contract

Luci is anchored at ~/workspace, but workers are not all launched from there. The dispatcher and scheduler use ~/workspace as the machine-level root; each worker then resolves its own current working directory from the ticket project.

Ticket/project type Worker cwd Why
General, empty project, PKA ~/workspace/PKA PKA repo-local instructions, skills, wiki, docs, and vault paths apply
Mission Control, mission-control, MC ~/workspace/mission-control MC app code, templates, static assets, tests, and mc.db live here
Infrastructure, Life, Finance, Machine Ops ~/workspace Machine-level scripts, tasks, logs, wiki, reports, and operational glue live here
Any project with a matching slug directory ~/workspace/<project-slug> Lets project repos carry their own local context
Unknown project fallback ~/workspace Safe machine-level fallback when no repo can be resolved

This distinction matters because ~/workspace/wiki is Luci's operational wiki, while ~/workspace/PKA and ~/workspace/mission-control are separate repos with their own local instructions. Workers can still read wiki pages by absolute path from any cwd, for example ~/workspace/wiki/02-mission-control/worker-system.md.

Mission Control itself is launched by systemd with WorkingDirectory=/home/lucienne/workspace/mission-control, but app.py sets WORKSPACE = Path.home() / "workspace". That is why /wiki renders ~/workspace/wiki, even though the Flask app process starts inside the MC repo.

Worker Lifecycle

  1. Pickup: ticket-pickup scheduler task runs mc_pickup.py every minute
  2. Dispatch: checks for todo tickets assigned to active MC worker roles (luci, tessa, scott, atlas, plus lucienne as Luci alias), respects MAX_WORKERS (2), and spawns or routes the relevant runtime
  3. Isolation: each worker gets its own git worktree (--worktree mc-mc-<id>)
  4. Effort: all workers run with --effort max for deepest reasoning
  5. Prompt: includes KEY FACTS (server identity, services), ticket context, all comments
  6. Role prompt: Luci coordinates, Tessa validates, Scott researches, Atlas reviews architecture/sign-off. Implementation work invokes dev-loop; QA/research/architecture phases use their role workflow instead.
  7. Output: worker outputs DONE:/REVIEW:/QUESTION: prefix
  8. Commit: Stop hook auto-commits + auto-pushes any changes
  9. Cleanup: worktree cleaned up after session ends

Output Prefixes

Prefix Ticket Status Meaning
DONE: done Work completed, ticket closed
REVIEW: in_review Needs Elmar's review before closing
QUESTION: needs_input Blocked, needs Elmar's input
(no prefix) in_review Default — needs manual review

Downgrade Rule

If a worker posts REVIEW: or QUESTION: mid-session but then outputs DONE:, the DONE is downgraded to REVIEW. The first "needs input" signal wins.

Session Persistence

Larry Dispatch

Tickets assigned to Larry are dispatched via SSH:

ssh elmar@46.225.208.1 "cd /home/elmar/cowork/projects/legalmind-explorer && claude -p ..."

Larry is intentionally limited to claude or codex as first-class CLIs. The post-completion review uses the selected Larry CLI/model instead of silently hardcoding Claude. Clean or skipped review now queues Tessa validation via the workflow action instead of marking the ticket done; critical review findings return the ticket to Larry for fixes.

Workflow Handoffs

Workflow child tickets feed the parent ticket forward automatically instead of completing as isolated records:

  1. Larry implementation completes and queues send_to_tessa.
  2. Tessa validation completes and queues send_to_luci_review.
  3. Scott research completion also queues send_to_luci_review.
  4. Luci review can output DONE: APPROVE, REVIEW: RETURN_FOR_FIXES, or REVIEW: SIGNOFF_REQUIRED to close, return to Larry, or queue Atlas.
  5. Atlas sign-off can output DONE: APPROVE or REVIEW: RETURN_FOR_FIXES.

MC remains the workflow state machine. The tmux/CLI transcript is evidence and an attach surface, not the source of truth for parent/child workflow state.

Hooks

Queue Reaper (MC-462)

scripts/queue_reaper.py runs every 15 min via the queue-reaper scheduler task. It handles stuck queued_messages that workers failed to consume:

This prevents the "silent message death" problem where user replies like "yes, go ahead" would sit unprocessed indefinitely.

Concurrency

Live Activity Panel (MC-747)

Workers emit structured events to ticket_events as they run, visible in the ticket detail UI:

Raw stream-json lines are also tee'd to ~/workspace/logs/mc-worker-<id>-stream.log for archival and debugging. The print-log (mc-worker-<id>.log) continues unchanged.

On the UI side: the ticket detail page calls GET /api/v1/tickets/<identifier>/events on load to replay history, then subscribes to the per-ticket SSE stream (/api/v1/tickets/<id>/stream) for live ticket_event messages. The panel is collapsed by default and auto-opens when the ticket is in_progress.

Session Auto-Resume (MC-3249)

When a worker session is interrupted (e.g. context window exhaustion, OOM, rate-limit cascade), the scheduler detects the crash and auto-resumes:

Auth Env-Leak Fix (MC-3412)

Worker sessions spawned by mc_pickup.py now clear stale ANTHROPIC_* environment variables before launch. Previously, leftover API keys from prior sessions could leak into the worker subprocess, causing authentication confusion when the worker's intended provider was non-Anthropic (e.g. GLM, Kimi). The fix strips these vars in mc_pickup.py before Popen.

Stale Runtime Reaping (MC-3536)

Orphaned interactive ticket runtimes (tmux sessions for in_progress tickets whose PID no longer exists) are now detected and cleaned up automatically. The reaper:

MC Pickup Resilience (MC-3552)

The mc_request() helper in mc_pickup.py now handles transient gunicorn worker-recycle blips (connection refused, 502) with a short retry loop instead of immediately failing the pickup cycle. This prevents false-negative "MC unreachable" errors during gunicorn worker rotation, which was causing pickup to skip valid tickets.

Key Takeaways

Help