Date: 2026-04-20 Author: Luci Audience: Elmar (decision), Lucienne (sync) Status: Plan — awaiting Elmar approval (A/B/C from Telegram) Trigger: Elmar question 2026-04-20T04:20Z — "does our MC chat need a rethink, is tmux better, how do we compare to vibe-kanban?"
Replace the current "spawn claude -p per message, bake message into prompt" pattern with long-lived claude --resume sessions running inside a tmux session per ticket, with a per-ticket git worktree, streamed to the browser via pipe-pane → tail -F → SSE → xterm.js. Keep the existing MC tables; add three columns and one sidecar log per ticket. ~3-4 days of focused work, shippable in phases.
This is the right shape for a single-operator system on one Linux box. We are deliberately not porting to vibe-kanban's Rust/portable-pty/WebSocket architecture — that's optimised for cross-platform distribution we don't need.
(Full audit in conversation transcript 2026-04-20. Key facts:)
mission-control/mc_interactive.py — interactive WebSocket sessions on /ws/ticket/<id>. Each user message spawns claude -p "<prompt>" --resume <session_id> as a one-shot subprocess. Output streamed via stream-json → WebSocket chunks. Session UUID persisted on tickets.session_id.mission-control/mc_pickup.py — background dispatcher. Same pattern (--resume-ticket --queued-message …) but messages come from the queued_messages table (atomic claim with 5-min lease). Output broadcast via HTTP POST → SSE.BloopAI/vibe-kanban — Rust + React, 25k stars. Each card = git worktree + portable-pty PTY + WebSocket + xterm.js + SQLite (Workspace/Session/ExecutionProcess tables). Cross-platform shipped product (Mac/Win/Linux, npx-installable).
We steal two ideas, we don't port the architecture:
git worktree add ../wt/ticket-N -b ticket-N. Parallel isolation, no merge conflicts between concurrent workers.claude --output-format stream-json written to a separate file for structured tool-call/cost/todo parsing, alongside the human-readable terminal pane.| What we need | tmux gives us | portable-pty gives us |
|---|---|---|
| Survives MC restart | ✅ Free | ❌ Need supervisor |
| Survives SSH disconnect | ✅ Free | ❌ Need daemon |
tmux attach -t ticket-42 from SSH at 2am |
✅ Killer feature | ❌ No equivalent |
| TUI/permission-prompt rendering | ✅ Real PTY | ✅ Real PTY |
| Ship cost from current code | Low (shell-out) | High (Python PTY library + IPC) |
| Cross-platform (Mac/Windows) | ❌ Linux/macOS only | ✅ |
| Structured event model | ❌ Parse stream-json sidecar | ❌ Same |
We're on one Hetzner box. Cross-platform doesn't matter. SSH-attach debug at 2am does.
┌─────────────────┐ ┌──────────────────────────────────┐
│ Browser │ │ Hetzner (Luci) │
│ (xterm.js) │◀── SSE ─│ │
│ │ │ ┌───────────────────────────┐ │
│ Send box ──────┼── POST ─▶ │ Flask MC (port 3001) │ │
└─────────────────┘ │ │ /api/ticket/N/input ─────┼──┐│
│ │ /api/ticket/N/stream ◀───┼─┐││
│ └───────────────────────────┘ │││
│ │││
│ tmux server (system) │││
│ ┌───────────────────────────┐ │││
│ │ session: ticket-42 │ │││
│ │ └─ claude --resume <uuid>│ │││
│ │ cwd: ../wt/ticket-42 │ │││
│ │ stdout ──────────────┼─┼┼┘
│ │ pipe-pane -o ▶ /var/log/ │ │└── tail -F
│ │ mc/ticket-42.log │ └─ stream-json
│ │ --output-format │ sidecar:
│ │ stream-json ▶ sidecar │ /var/log/mc/
│ └───────────────────────────┘ ticket-42.json
│ │
│ git worktrees: │
│ ../wt/ticket-42 (branch ticket-42) │
│ ../wt/ticket-43 (branch ticket-43) │
└──────────────────────────────────┘
git worktree add ../wt/ticket-N -b ticket-N, creates tmux new-session -d -s ticket-N -c ../wt/ticket-N 'claude --resume <uuid> --output-format stream-json | tee /var/log/mc/ticket-N.json', runs tmux pipe-pane -t ticket-N -o 'cat >> /var/log/mc/ticket-N.log'.tail -F /var/log/mc/ticket-N.log, replays last 10 KB, then streams new bytes. xterm.js renders./api/ticket/N/input → MC runs tmux send-keys -t ticket-N -l "<message>"; tmux send-keys -t ticket-N Enter. Claude (already running) sees stdin, processes, streams response back into the same pane → file → SSE → browser.tmux-mc.service (a tmux user-server unit) which restores empty tmux state; MC's startup task replays per-ticket tmux new-session for any ticket with status=in_progress (worker resumes from --resume <uuid>).tmux kill-session -t ticket-N, archives sidecar logs, optionally git worktree remove.The tmux pane is for humans. The stream-json sidecar (/var/log/mc/ticket-N.json) is for MC. A small parser tails the JSON file and writes:
- tool_use events → existing ticket_events table (we already have this from MC-747)
- result events → mark turn complete, broadcast turn_end SSE
- cost_usd deltas → ticket cost counter
This keeps the structured-event model we already built; we're just changing how the events arrive (file tail vs subprocess pipe).
Add to tickets table:
- tmux_session TEXT — tmux session name (usually ticket-{id}, NULL if not yet spawned)
- worktree_path TEXT — absolute path to git worktree, NULL for tickets without code work
- pane_log_path TEXT — absolute path to pipe-pane log file
Keep existing session_id (claude UUID), worker_pid (now points to claude process inside tmux), worker_started, status.
Drop after migration:
- Nothing yet. Keep queued_messages for now — it becomes the buffer for messages typed while a previous turn is mid-stream (we send them when claude returns to prompt, not as fresh subprocesses).
Goal: Prove the loop end-to-end on one ticket, no UI changes.
mission-control/mc_tmux.py:start_session(ticket_id, session_uuid, worktree_path) — runs the tmux+pipe-pane+sidecar incantationsend_input(ticket_id, text) — send-keys -l + Enterkill_session(ticket_id)is_alive(ticket_id) — tmux has-session -t ticket-Npython -m mc_tmux test 42 — spawns a real session, sends "hello what is 2+2", reads pane log, prints output. Must work before moving on.Goal: Browser sees live tmux pane.
/api/ticket/<id>/stream — SSE generator that tail -Fs pane_log_path, sends raw bytes.POST /api/ticket/<id>/input — body {text: "..."}, calls send_input.templates/ticket.html — drop xterm.js (xterm@5.x from CDN), connect to SSE, render. Add a textarea + send button below.Goal: Code-work tickets get isolated branches.
kind=code (or any explicit flag): git worktree add ../wt/ticket-N -b ticket-N from current HEAD.cwd: worktree_path to tmux.git worktree remove ../wt/ticket-N (with confirmation if branch has uncommitted changes)./api/ticket/<id>/diff for the ticket page to show worktree diff vs base.Goal: Server reboots don't orphan tickets.
tmux-mc.service (user-mode) that owns the tmux server.status=in_progress AND tmux_session IS NOT NULL, for each one re-spawn the tmux session with --resume <session_id>.tmux has-session. If missing, mark ticket needs_input with comment "worker dropped, awaiting reattach".Goal: mc_interactive.py and mc_pickup.py route through mc_tmux instead of spawning per-message subprocesses.
mc_interactive.send_message → if no tmux session, start one (Phase 1); then send_input. Output streaming switches from WebSocket-of-stream-json-chunks to SSE-of-pane-bytes (Phase 2).mc_pickup queue drain → same path: send_input, parse sidecar JSON for completion signal.ssh luci 'tmux attach -t ticket-42'.tests/test_mc_tmux.py covering spawn / send / kill / has-session / log tail.Total: ~4-5 days of focused work, shippable phase-by-phase.
| Risk | Mitigation |
|---|---|
send-keys race on rapid bursts |
Use -l literal flag, send body and Enter as separate commands, no shell escaping |
| ANSI escape parsing in browser | xterm.js handles all escape codes natively, no custom parser needed |
tail -F SSE leaks file handles on disconnect |
Close file in SSE generator's finally block; cap idle SSE clients at 5 min like current WebSocket |
| Log files grow forever | Logrotate /var/log/mc/*.log daily, keep 7 days. Sidecar JSON archived to ~/workspace/reports/sessions/ on ticket close |
| tmux server crash kills all sessions | Run tmux as systemd user-service with Restart=always. Workers re-spawn on boot from Phase 4. |
| Two pollers on same Telegram bot (409 Conflict) | Unrelated; the existing --settings settings-worker.json gate still applies — tmux doesn't change this |
| Worktree disk usage | Cap at N=10 active worktrees; older ones archived. Monitor in heartbeat. |
| Migration breaks existing in-flight tickets | Phase 5 ships behind a per-ticket flag (use_tmux=true); roll forward gradually, leave old path as fallback for 1 week |
queued_messages away — still needed for "type while previous turn streaming"mission-control/mc_tmux.py, mission-control/templates/ticket-tmux.html (or replace existing), mission-control/static/xterm.css, mission-control/static/xterm.js, tests/test_mc_tmux.py, ~/.config/systemd/user/tmux-mc.servicemission-control/mc_interactive.py (route through mc_tmux), mission-control/mc_pickup.py (route through mc_tmux), mission-control/app.py (new endpoints), mission-control/pg_schema.py (3 new columns), ~/workspace/luci-manifest.md (document new layout)mc_interactive.py, mc_pickup.py, app.py, pg_schema.pyA) Approve full plan — I create MC tickets MC-NNN through MC-NNN+5 (one per phase) and start Phase 1 tonight
B) Approve Phase 1 only as a POC — ship the spike, you review, then commit to the rest
C) Adjust scope (e.g. drop worktrees, drop xterm.js for plain <pre>) — tell me what to cut
D) Park — current MC is good enough