Ticket is done; runtime is closed.·profile claude_opus_1m_medium
Description
MC-5025
Operational repair follow-up discovered during MC-5020 controller recovery.
Problem:
- Hermes Kanban board `mc-internal` now fails CLI operations with SQLite corruption:
- `PRAGMA integrity_check` reports: `Tree 8 page 8 cell 71: Rowid 3248 out of order`
- Hermes printed: `Refusing to open corrupt kanban DB at /home/lucienne/.hermes/kanban/boards/mc-internal/kanban.db`
- Backup created by Hermes guard: `/home/lucienne/.hermes/kanban/boards/mc-internal/kanban.db.corrupt.a55b515e3fd16602.bak`
- This surfaced while a Kanban worker heartbeat failed during MC-5020. MC-5020 code has been completed by hand and closed, but the internal Kanban board needs repair before further reliable internal dispatch.
Scope:
- Stop or let finish active mc-internal worker processes safely.
- Preserve current corrupt DB and backup files.
- Recover or rebuild `~/.hermes/kanban/boards/mc-internal/kanban.db` using SQLite `.recover` / last good copy / Hermes-supported path.
- Verify `sqlite3 ... 'PRAGMA integrity_check'` returns `ok`.
- Verify `hermes kanban --board mc-internal list`, `runs`, and `dispatch --dry-run` work.
- Avoid losing completed task history if possible; if history must be truncated, document exactly what was preserved/lost.
Acceptance:
- Kanban CLI no longer refuses the board.
- Integrity check passes.
- Active/running zombie rows from MC-5020 are reconciled or documented.
- Comment commands/evidence and any remaining limitations.
Expected check-in: 2026-06-12 12:00 SAST.
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· High
· Luci
▼
SState
Done
Waiting: controller blocked internal kanban db corrupt
PPeople
TTiming / Details▼
api (human)
Mission Control
21h ago
20h ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.MC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
MC-5025 repair completed by Luci.
Actions taken:
- Confirmed MC-5028 is duplicate/cancelled; MC-5025 is canonical.
- Rechecked mc-internal DB corruption: original still failed `PRAGMA integrity_check` with `Tree 8 page 8 cell 71: Rowid 3248 out of order`.
- Quiesced the stale pre-repair worker PID 2920635 for t_164b2ade so replacement was safe.
- Built a row-level reconstructed DB at `/home/lucienne/.hermes/kanban/boards/mc-internal/recovery-20260612T111017+0200-rowid-rebuild-v3/kanban.rowid-rebuilt.db`.
- Preserved counts: tasks=193, task_runs=338, task_events=3253, task_comments=318, task_links=82.
- Preserved/renumbered the two post-corruption task_events that had reused low IDs.
- Reconstructed two task rows whose btree pages were malformed (`t_3decdaed`, `t_164b2ade`) and runs 331-338 from event/log/MC evidence.
- Took final live backup before replacement: `/home/lucienne/.hermes/kanban/boards/mc-internal/kanban.db.pre-mc5025-replace.20260612T111201+0200.bak`.
- Atomically replaced `/home/lucienne/.hermes/kanban/boards/mc-internal/kanban.db` and removed stale WAL/SHM files.
Verification:
- `sqlite3 ~/.hermes/kanban/boards/mc-internal/kanban.db 'PRAGMA integrity_check;'` => `ok`.
- `hermes kanban --board mc-internal stats` succeeds.
- Current board state after gateway resumed dispatch: archived=8, done=184, running=1 (`t_164b2ade` / MC-5019 continuation), blocked=0, ready=0.
Notes:
- MC-5018 and MC-5020 stale running claims were marked done/superseded based on their already-landed MC controller completions.
- MC-5019 continuation (`t_164b2ade`) was re-spawned by the gateway against the repaired DB (current pid 3003395 at verification time).
api20h ago
Final acceptance check: `hermes kanban --board mc-internal dispatch --dry-run --json` returned reclaimed=0, spawned=[], no stale/auto_blocked items; integrity_check still returns ok.
luci-board-manager21h ago
Controller health check found the mc-internal Hermes Kanban DB is corrupt: PRAGMA integrity_check reports `Tree 8 page 8 cell 71: Rowid 3248 out of order`. A delegated safe-recovery attempt made timestamped backups under `/home/lucienne/.hermes/kanban/boards/mc-internal/recovery-20260612T101803+0200*` and tried `.recover`, `VACUUM INTO`, `.dump`, and manual salvage. No recovered DB was both integrity-clean and at least as complete as the original (`tasks=193`, `task_runs=338`, valid-looking `task_events=3248`). Original DB was left untouched. Active internal workers were detected (`t_cf8719da` duplicate PIDs 2917920/2941611, `t_164b2ade` PID 2920635, `t_79b22be5` PID 2905999), so atomic replacement is unsafe until workers are quiesced and state is reconciled. Visibility-only boundary preserved: no MC pickup/runtime endpoints were used.
luci-board-manager20h ago
[visibility-only board-manager] Controller progress on internal Kanban recovery: verified MC-5018 is already done and terminated stale done-ticket Kanban worker t_cf8719da (pid 2917920; terminated=False, sigkill=True). Active MC-5019 worker t_164b2ade is still alive (pid 2920635, cwd=/home/lucienne/workspace/_mc_internal_worktrees/MC-5019-scheduled-runtime-profiles) so MC-5025 remains blocked; no mc-internal DB replacement attempted. No MC runtime/pickup/send/harvest endpoint was used.
luci-board-manager20h ago
[visibility-only board-manager] Follow-up verification: after SIGKILL, /proc/2917920 is now gone, so the stale MC-5018/t_cf8719da worker is confirmed stopped. MC-5019/t_164b2ade remains the only verified live internal Kanban worker from this inspection.