You're offline — showing cached data

MC-4052

MC-4050 hotfix: SQL format-equivalence + reaper lock ownership + helper TZ drift
2026-06-13 06:14:42 SAST
Home Board MC-4052

MC-4050 hotfix: SQL format-equivalence + reaper lock ownership + helper TZ drift

Council review of landed MC-4050 (commit 7f27d9a on mission-control master) surfaced 2 CRITICAL bugs + 2 IMPORTANT issues. Verified locally on live mc.db: CRITICAL #1 — complet...
State Done Next Action Closed Owner Luci Runtime Closed Age 21d ago
MC-4052
Ticket is done; runtime is closed. · profile claude_opus_1m_medium · cwd /home/lucienne/workspace/mission-control · uptime 20d 16h · last activity 20d 13h ago

Description

MC-4052
Council review of landed MC-4050 (commit 7f27d9a on mission-control master) surfaced 2 CRITICAL bugs + 2 IMPORTANT issues. Verified locally on live mc.db: CRITICAL #1 — completed_24h / failed_24h SQL format mismatch (mission-control app.py): - OLD query: strftime('%Y-%m-%dT%H:%M:%S+02:00','now','+2 hours','-24 hours') → ISO-T-+02:00 string matching started_at format → 8384 rows - NEW query: datetime('now','-24 hours') → space-separated UTC string → 13260 rows (+58% over-count) - started_at format: '2026-05-23T14:51:04.823689+02:00' (ISO-T, +02:00) - SQLite text comparison: 'T' (0x54) > ' ' (0x20) → every T-row at threshold-1 calendar date trivially exceeds space-formatted UTC threshold - Lint test only catches the literal '+02:00' string, not the semantic regression - Fix: produce ISO-T format with +02:00 offset using a format-preserving expression, OR compare normalised values. Add behaviour test that counts rows in a fixed mc.db fixture and asserts equality across OLD↔NEW. CRITICAL #2 — reap_orphan_task_runs release_lock ownership race (scheduler.py): - release_lock(task_id) does unconditional os.unlink with no run_id ownership check - After scheduler crash: stale 'running' row + new healthy run hold same task lock - Reaper kills stale row AND wipes healthy run's lock → next tick acquires lock + double-executes - Fix: pass run_id into release_lock, read lock file contents, only unlink if lock-owner == reaped run_id. Add PID-skip + double-execute regression test. IMPORTANT (3-of-4 council agreement): - +02:00 SAST offset reintroduced in Python helpers (_running_age_seconds, _stuck_threshold_seconds, SAST const) — same drift risk the SQL fix targeted. Centralise SAST handling in a single helper. - PID-skip branch in reap_orphan_task_runs has zero test coverage (most safety-critical path). - /api/tasks/reap unauthenticated — accepted as Lane B per audit scope. Council outputs: /tmp/council-{codex,gemini,glm,opus}.txt DO NOT deploy MC-4050 reaper UI live until this hotfix lands — production dashboard currently shows inflated 24h counts on master. Reaper logic itself is sound; fix the SQL semantics + lock race and ship. Parent: MC-4050. Blocks: MC-4045 campaign completion + any restart/deploy of mission-control service.

Activity

done
Luci is working...
Live
No activity yet
Help