Luci · Topic noted: scheduled tasks page.
Luci
Topic noted: scheduled tasks page.
What you need? Options:
- Bug on tasks page (`/tasks` in MC, `app.py` `tasks_page()`)
- New feature/UI change
- Task not running / runtime profile issue
- Audit existing tasks
Say what.
Luci
Running tester-panel on MC `/tasks` page. Step 0 first — establish context.
Luci
Context set. Now check page accessibility so agents can actually view it.
Luci
200, 664KB, no auth gate from localhost. Agents can fetch `http://localhost:3001/tasks` directly. Context locked:
- **What**: MC `/tasks` — scheduler dashboard. 143 tasks. Stats header (total/enabled/failed_24h/completed_24h/stuck + runtime-profile-lint chip), task list w/ status (disabled/overdue/stuck/pending/completed/failed/running), schedule+next-run, runtime_profile, run-history (last 50), hooks list. Detail page `/task/<id>`.
- **Who**: Luci (PM agent) + Elmar, monitoring scheduled-task health.
- **Source**: `app.py` `tasks_page()` L3629-3733, `templates/tasks.html` (1126L), `templates/task_detail.html` (498L).
- **Live**: http://localhost:3001/tasks
Launching Wave 1 — 6 parallel tester agents (review-only).
Luci
Wave 1 complete — 6 agents. Consolidated roadmap below. One **confirmed bug**, a stack of unambiguous quick wins, and structural calls that need your taste.
## `/tasks` audit — prioritised roadmap
### 🔴 Confirmed bug (forensic-verified, file:line)
- **Stuck-badge under-reports zombies for long-cadence tasks.** `_stuck_threshold_seconds` (app.py:3626) = `max(timeout, 2×cadence, 600)` but the actual reaper caps cadence at 24h (`scheduler.py:43 MAX_CADENCE_REAP_S=86400`). UI omits the cap → a daily/weekly task hung 24–48h is already reaper-eligible but UI still shows plain `running`, not `stuck`. Docstring claims "stays in sync with reaper" — it doesn't. Fix: `max(timeout, min(2*cadence, scheduler.MAX_CADENCE_REAP_S), 600)`.
- *All header counts MATCH (143/123/4/5964/0). No timezone off-by-2. Overdue/next-run logic sane. Only the stuck threshold is wrong.*
### 🟢 Unambiguous quick wins (low-risk, high-value)
1. **Sticky table header + sticky first column** — confirmed by in-browser scroll: header & task-name scroll away on the 143-row list, no anchor mid-list. CSS only.
2. **Inline failure reason** on failed/timeout/stuck rows — surface `last_run.output[:120]`; today you must click into `/task/<id>` to learn *why*.
3. **Problem-first default sort** — default is `last_run desc`, buries failures on page 3+. Sort overdue→stuck→failed→rest (or auto-open Attention view when problems exist).
4. **Status legend + one-line page intro** — 7 status badges, zero legend; no "what is this page".
5. **`runtime_profile` column + filter + search token** — page's whole point is provider routing, yet no way to slice "show all GLM tasks".
6. **Next-run relative time** ("in 4h") both pages + **human cron** on detail page (list shows "Daily 05:35", detail shows raw `35 5 * * *`).
7. **auto-suspended split out of Disabled** — system killed it for repeated failure = top triage item, currently hidden in Disabled bucket; give own count + keep red (intentional-disable → grey).
8. **Label "Failed runs (24h)"** + click-through (header=4 run-events ≠ table=2 last-run-failed; unlabeled, can't reconcile).
9. **Fixed comment-bar overlap** on detail page (`task_detail.html:468` covers last content).
10. **Jargon tooltips** — stuck/reaper/cadence/runtime_profile/lint.
11. **Stray trailing-quote** in hooks script name (`app.py:3707`, strip quotes). Cosmetic.
### 🟡 Bigger items (defer or batch)
- No auto-refresh — every action hard-reloads 664KB; lint poll endpoint built but unused. Multi-select status filters + bulk actions (run/disable/delete). Consecutive-fail trend / sparkline.
### ⚖️ Structural calls — need your decision (conservative-retirement: I won't cut without your say)
Two agents conflict here (IA skeptic wants to cut stat cards + hooks; decision-maker wants to keep+improve stats), so asking.
Luci
Decisions locked:
- Hooks → **move to /system**
- Stat cards → **drop** (keep "N/N enabled" subheading)
- IA → **full restructure** (4 health buckets: Problems/Running/Healthy/Disabled + Recent Runs → own `/runs` tab)
- Delivery → **MC ticket + dev-loop, all quick wins, Tessa UI gate**
This is a Tier 2/3 deliverable (3 files + 2 new routes + restructure) — orchestrator routes it to a ticket, not freelance edits here. Writing the spec, then opening the ticket.
Luci
Ticket created: **MC-4619** (status `todo`, assigned luci) → `mc_pickup` spawns a dev-loop worker on next cycle. I gate at `in_review`.
## Done
- **6-agent tester-panel** ran on `/tasks` (cold-user, decision-maker, power-user, UX sweep, IA skeptic, forensic verifier).
- **1 real bug** found + forensic-confirmed (stuck-badge threshold ignores reaper's 24h cap → long-cadence zombies mislabeled `running`). All header counts/timezone/overdue logic verified correct.
- **Decisions** captured from you, baked into spec.
- **Spec** written, **ticket** opened, routed to dev-loop.
## Links
Ticket: http://100.118.207.3:3001/ticket/MC-4619
Spec: http://100.118.207.3:3001/md-view?file=docs/plans/2026-06-02-tasks-page-overhaul.md
## What ships in MC-4619
Bug fix + full IA restructure (hooks→/system, drop stat cards, 8 tabs→4 health buckets, Recent Runs→/runs) + 10 quick wins (sticky headers, inline failure reason, runtime_profile column/filter, relative next-run, auto-suspend split, legend+intro, tooltips, comment-bar fix, quote strip, sort null-safety). Gates: regression tests → Tessa 1440×900 + 390px → second-opinion → push.
Deferred to follow-ups: auto-refresh/live status, bulk actions, fail-trend sparkline, per-task cost (schema-blocked).
Worker check-in: ~90 min. Watch board for `in_review`.
Luci
New panel target: MC board + tickets. Step 0 — establish context.
Luci
Context locked:
- **What**: MC **Board v2** (Kanban ticket board, JS-rendered — raw HTML has 0 cards, data comes from an API) + **ticket detail** page (`ticket.html`, 2202L — big). Recently reworked (MC-4597 action/ownership semantics).
- **Who**: Luci (orchestrator, daily) + Elmar (owner, triage) managing all work.
- **Source**: `app.py` `board()` L2626, `_board_v2_context()` (~L2877), board_v2 routes L2974+, `board_v2_post_status` L3001, `ticket_detail()` L3778. Templates: `board_v2.html` (658L), `ticket.html` (2202L).
- **Live**: http://localhost:3001/board · ticket e.g. http://localhost:3001/ticket/4619 . Board needs a real browser (browser-harness) or the JSON API to see cards.
Launching Wave 1 — 6 parallel testers (review-only).
Luci
Wave 1 done — 6 agents. **Big convergent finding** + 6 confirmed bugs (incl. one real production bug). Board's *data* is sound (forensic: all counts match DB exactly, no dup/missing/orphan, snooze+TZ correct) — problems are UX/IA + a few real bugs.
## MC board + tickets — consolidated roadmap
### 🚨 Headline (3 agents independent) — the "Blocked" column lies
All 42 open tickets = `waiting` + owner `elmar` → routed to **red "BLOCKED"** column (MC-4777 intended `waiting→blocked`). But each card correctly wears the gold **"Needs you"** rail. Column says *"blocked, not your problem"*; card says *"your queue"* — direct contradiction. Cold user reads "42 BLOCKED" as system on fire; Elmar's *entire decision queue* is disguised as a blocked wall, with no "Needs you" count, no "mine" filter. The card-level ownership rework is undermined by the column taxonomy. **This conflicts with a deliberate prior decision (MC-4777) → your call below.**
### 🔴 Confirmed bugs (ship via dev-loop)
1. **Board falsely flips to "You're offline" on a healthy server** (UX, reproduced 3×, root-caused). `static/sw.js:320` gives `/board` a 3500ms fetch timeout; `/` and `/ticket/` already raised to 18000ms for this exact bug (MC-4607/4613). 444KB board exceeds 3.5s under load → serves offline.html. One-liner; same fix needed on wiki/apps/reports L336/353 + nav fallback L447.
2. **J/K keyboard nav + Enter dead on v2** — `app.js:495` queries legacy `.ticket-card`; v2 renders `.v2-card`. Selector fix.
3. **Board column headers not sticky** — `.v2-col-header` no `position:sticky`; 42-deep column scrolls away, lose column+count. CSS fix.
4. **Workflow/WAT button wall leaks outside "Advanced" collapse** — `ticket.html:1140` `#workflow-panel` is a sibling *after* `</details>`; ~294px of controller-only buttons always show to Elmar. Move inside.
5. **v2 filters don't persist** — `v2Reload` never updates URL; sessionStorage restore reads `location.search` v2 never writes → reload/share loses filters.
6. **Latent LIMIT-101 clip** (forensic) — open columns share a 101-row recency-capped fetch before done-today filter; only **6 rows headroom**. If open+recent-done > 101, oldest open cards silently vanish. Fragile. Split query: cap only the Done column.
### 🟢 Quick wins
- **Search is broken for the use case** — board search = client-side over ≤253 rendered cards (title/id only, no body); Cmd+K palette = latest 50 tickets only, no `q` param. Across 100s of tickets it misses almost everything → server-side `q` (title+id+body).
- **Card chip soup** — up to 11 chips/card; owner-badge ("Needs you") duplicates assignee ("elmar"). Cap to ~4 load-bearing (owner/needs-you · assignee · project · age); demote comments/subtasks/deferred to one meta line; CTA only on review/needs_input.
- **Blocked-reason not inline** — `pending_state` has exact reason (payment / pdf_password / legal_review) but card shows generic "Waiting". Humanise onto card.
- **Age = created_at not last-activity** — stale ticket looks healthy; base on `updated_at` / add idle chip.
- **No legend, "Board v2 · live" version leak** — title → "Board", dismissible legend popover (columns/chips/rails).
- **Insider jargon + raw runtime dumps in ticket activity** ("Ctrl: ambient", DONE/REVIEW/QUESTION, "MC-3482 contract") → collapse system events under a toggle; tooltips for Larry/Tessa/runtime_profile.
- **Empty Activity feed = blank box** → empty/loading state.
- **Ticket sidebar**: accordion open-by-default stacks 8 sections; duplicate "Runtime" name+icon; near-empty History dup of Activity. Collapse past State+People; rename advanced → "Session"; cut History (fold timestamps into Timing).
### ✅ Confirmed good (no touch)
All board counts tie to DB; ticket sticky sub-header works; comment composer does **not** overlap (unlike task_detail); colour-blind-safe (label+shape); light-theme AA handled (MC-4654).
### 🟡 Deferred
No bulk actions; no undo on quick actions; no per-column priority sort; drag-drop doesn't exist (click/swipe only — set expectation or build).
Now the genuine decisions:
Luci
Decisions locked:
- **Add "Needs you" column** (route owner=elmar `waiting` there + count + "mine" filter; reserve red Blocked for controller-blocked — partial override of MC-4777)
- **Card: cap to 4 chips + reason inline**
- **Build server-side `q` search now**
- **Full overhaul, one ticket + dev-loop, Tessa-gated**
Tier 3 (new column taxonomy + search backend + IA restructure, 6 files). Writing spec → opening ticket.
Luci
Ticket **MC-4779** created (status `inbox`). Moving to `todo` so the dispatcher picks it up.
Luci
**MC-4779** queued (`todo`, luci) → dispatcher spawns a dev-loop worker. I gate at `in_review`.
## Done
- **6-agent tester-panel** on MC board + ticket pages (cold-user, decision-maker, power-user, UX, IA, forensic).
- **Forensic clean**: all board counts tie to DB, no dup/missing/orphan, snooze+TZ+ownership correct, composer no-overlap. Data layer trusted.
- **Headline**: your whole decision queue (42 tickets) was hidden in a red "Blocked" wall (MC-4777 routing) while cards said "Needs you" → fixing with a dedicated **Needs-you column** + count + mine-filter.
- **6 real bugs** found, incl. a production one: board falsely shows "You're offline" on a healthy server (sw.js 3500ms timeout).
- Spec written, ticket opened, routed to dev-loop.
## Links
Ticket: http://100.118.207.3:3001/ticket/MC-4779
Spec: http://100.118.207.3:3001/md-view?file=docs/plans/2026-06-07-board-tickets-overhaul.md
## In MC-4779
Bugs (offline-flip, J/K nav, sticky headers, workflow-leak, filter-persist, LIMIT clip) + Needs-you column + card cap-to-4 + reason-inline + server-side `q` search (board + Cmd+K) + ticket-page IA (collapse sidebar, cut History, rename dup Runtime, system-events toggle) + legend/version cleanup. Gates: regression → second-opinion → Tessa 1440×900 + 390px.
Deferred to follow-ups: drag-drop, bulk multi-select + undo, per-column priority sort.
Note: this **partially overrides MC-4777** (waiting→blocked routing) — flagged in the ticket. Worker check-in ~2h.
Luci
Already ran this exact panel 11h ago → MC-4779. Check its state before re-running (avoid duplicate findings).
Luci
MC-4779 = **done**, but **zero commits** to scope files since 09:00. Red flag (MC-4415 rule: verify done against commits, not narration). Investigating before re-running anything.
Luci
MC-4779 done via internal Kanban (claudebuilder, isolated worktree), landed to **origin/master @ 08ab4c5** — my local checkout is just behind. Live service already runs the new code (comment: /board 200, console clean, mobile gate APPROVED). So the board **changed** since this morning's panel — re-running now is valid (it audits the post-overhaul board). Syncing local source first so agents read current code.
Luci
`master == origin/master` (0/0) but **08ab4c5 does not exist in this repo**, and no board commits today. The "landed to master @ 08ab4c5" claim doesn't reconcile against ground truth (MC-4415 pattern). Checking what's actually deployed + where the worktree work went.
Luci
Ground truth contradicts the "done" claim:
- **sw.js bug #1 NOT fixed on live** — still `fetchWithTimeout(e.request, 3500)`.
- **"Board v2 · live" version leak still present** on live.
- **Local source == origin/master, zero board commits today** → local code unchanged from pre-MC-4779.
- Commit `08ab4c5` (claimed landing) **absent from repo**.
The "needs you" strings are likely the pre-existing card rail, not the new column. Pinning down what actually shipped vs what the worktree holds.
Luci
Now it's clear. The work exists but **was never merged or deployed**:
- Branch `wt/MC-4779-board-tickets-overhaul` @ **08ab4c5** has the 3 MC-4779 commits — but `08ab4c5` is **NOT an ancestor of origin/master** (never merged).
- Live still has the sw.js 3500 bug + "Board v2 · live" leak → live runs **old code**.
- Ticket marked **done** with comment *"Landed to master and live"* — **false**.
Confirming what the live service actually serves from before I report.
Luci
Verified — definitive. **The MC-4779 overhaul never shipped:**
- `luci-dashboard` started **07:19 SAST** (before the work), serves from `~/workspace/mission-control` on **master** — and master has **no MC-4779 changes**.
- Live board still shows **INBOX/ACTIVE/BLOCKED/DONE** (BLOCKED ×94), **no Needs-you column** (the "needs you" ×10 is the pre-existing card rail). sw.js 3500 bug + "Board v2 · live" leak both still live.
- The work is committed on branch **`wt/MC-4779-board-tickets-overhaul` @ 08ab4c5** (3 commits) in the kanban worktree — **not merged to master, not deployed**.
- Ticket marked **done**, comment claimed *"Landed to master and live"* — **false**.
Re-running the panel now is pointless — it'd audit the *old* board and reproduce this morning's findings. Per loop-discipline: freeze, don't feed it. Correcting the false `done` and reporting instead.
Luci
**MC-4779 reopened** → `in_progress`, with the verified discrepancy logged.
## Bottom line
The board overhaul you're trying to re-test **was never shipped**. Marked done at 12:52 with *"landed to master and live"* — but it's only on an unmerged branch. Live board = unchanged from this morning. So a fresh panel would just re-find the same 6 bugs.
**Proof:** master has 0 MC-4779 commits · claimed commit 08ab4c5 not in master · live sw.js still 3500ms (bug #1) · live still "Board v2 · live" + BLOCKED column, no Needs-you column · `luci-dashboard` running pre-work code from 07:19.
**Work that exists** (built, not landed): branch `wt/MC-4779-board-tickets-overhaul` @ 08ab4c5 — 3 commits (Needs-you lane, card cap, search q, ticket Live-panel/System-events, History removal, sw.js fix). Prior auto-path (internal Kanban) flagged dirty-worktree + false-done 3× before claiming done.
Don't re-run the panel against old code. Land the existing work first, then panel the result if you want.
Luci
Hand-recover it is. First — resolve repo topology (worktree files are at `templates/` not `mission-control/templates/`, suggesting `mission-control` may be its own nested repo). Must know which repo/branch `luci-dashboard` actually serves before merging.
Luci
Major correction: **`mission-control` is its own nested git repo** (separate from `workspace`). I checked 08ab4c5 against the *wrong* repo earlier. In the mission-control repo, **08ab4c5 IS merged to origin/master**. So the work likely *did* land — but live serves old code (process started 07:19, pre-merge). Checking whether the disk checkout / running process is just stale.
Luci
Real picture now clear: the mission-control checkout (what gunicorn serves) **has the MC-4779 commits on disk** (Needs-you column markers ×15), but:
- **Live process is stale** — gunicorn started 07:19, pre-merge → still running old in-memory app.py (old column routing). **Never restarted.**
- **sw.js board-timeout bug NOT actually fixed** — disk still `3500` at L430/460.
- **"Board v2" version leak NOT removed** — still on disk.
So ~most of the overhaul is on disk but undeployed, and 2 items were never really done. Auditing all 6 bugs + key changes on disk before fixing.
Luci
Revised, accurate diagnosis: **MC-4779 is ~fully implemented on disk/master** (Needs-you column routing in app.py L2758-2887, J/K nav fixed L529, v2Reload replaceState L532, workflow-panel moved, sw.js mostly converted to the 18000ms const). The real failure was a **deploy gap** — gunicorn started 07:19, never restarted after the ~12:50 merge → live runs stale in-memory code. Plus **one genuine miss: sw.js L430 + L460 still `3500`**.
So hand-recovery = fix the 2 sw.js lines + restart + verify live. That's a code change → invoking dev-loop first (mandatory).
Luci
dev-loop classified: **Tier 1** (2-line sw.js fix + deploy-and-verify of already-merged code). Reading sw.js L415-465 to understand L430/L460 before editing.
Luci
Reading sw.js resolved it: the **`/board` handler (L313-326) already uses `NAVIGATION_NETWORK_TIMEOUT_MS`** — board false-offline bug *is* fixed on disk. The two remaining `3500`s are L430 (JS/CSS network-first) and L460 (static-asset stale-while-revalidate) — different request classes, never serve offline.html, correct as-is. My earlier "miss" was a misread.
**So there are NO code gaps.** The entire failure is the **deploy gap**: gunicorn stale since 07:19 → live runs old in-memory app.py (old column routing) + cached templates. The full MC-4779 overhaul is on disk/master. Fix = **restart + verify**. No edits needed.
Restarting luci-dashboard to deploy.
Luci
Restarted 20:32:05, active, all probes 200. Verifying live now reflects the overhaul.
Luci
Live now reflects the full overhaul post-restart:
- **Needs-you column IS live**: columns = `Inbox · Active · Needs you · Blocked · Done` ✅
- **Version leak gone** (no "Board v2 · live") ✅
- **sw.js board handler** = `NAVIGATION_NETWORK_TIMEOUT_MS` (18000, no false-offline) ✅
Now the UI gate (this is live UI) + a count tie-out so I don't close on markup alone. Running count check + dispatching Tessa against live in parallel.
Luci
Tie-out: DB open = 35 (all `waiting`); board renders 64 unique ticket links (= 35 open split across Needs-you/Blocked/Active/Inbox + done-today). No open-ticket loss. Markers green. Now the live UI gate — dispatching Tessa against the live board + a ticket (desktop + mobile).
Luci
Tessa: **NOT APPROVED** — 1 Major (sticky headers still don't stick) + 2 Minor. Everything else passed (Needs-you column correct, counts tie 35==35, humanized reasons accurate, WAT hidden behind Advanced, History gone, accordion defaults, mobile switcher).
The Major is a real half-fix: worker added `position:sticky` to `.v2-col-header` but `.v2-kanban` has `overflow-x:hidden` (→ implicit `overflow-y:auto`) sized to content (6003px), so the page scrolls, not the container → sticky locks off-screen. Reading the CSS to fix precisely.
Luci
Worker's intent (L120-124): `#v2-panel` should be the scroll container the headers pin to. The bug is L21 `.v2-kanban { overflow-x: hidden }` → computes `overflow-y:auto` → `.v2-kanban` becomes an unintended, non-height-constrained scroll container, so headers pin to *it* (which scrolls off with the page). Checking `#v2-panel`/`.v2-panel-wrap` to pick the correct fix.
Luci
`#v2-panel` is the intended scroller (`height:100%; overflow:auto`). Need to confirm the height chain (`.v2-board-shell`) actually bounds it, else the document scrolls regardless. Reading the shell.
Luci
`.v2-board-shell` is `height:100%` in a flex column. Classic flex-scroll trap: a `flex:1` child (`.v2-panel-wrap`) without `min-height:0` won't shrink below content → nothing scrolls internally → document scrolls → sticky pins to the off-screen `.v2-kanban`. Need to confirm the base.html chain bounds the shell before fixing.
Luci
Confirmed: base.html has **no height-bounded wrapper** around `{% block content %}` (L63) — document scrolls, so `#v2-panel`'s internal scroll never engages and sticky pins to the off-screen kanban. Need to see the nav/content structure to bound the board shell correctly (without a risky global change).
Luci
Diagnosis complete. Root cause: the board's scroll chain (`.v2-board-shell` → `.v2-panel-wrap` → `#v2-panel` → `.v2-kanban`) uses `overflow:hidden/auto` but is **never height-bounded** (base.html `.container` grows with content → document scrolls). So `#v2-panel`'s internal scroll never engages, and `.v2-kanban`'s `overflow-x:hidden` (computes `overflow-y:auto`) makes it a non-scrolling scroll-container that the sticky header pins to off-screen.
Fix: convert the chain's `overflow` to `clip` (clips horizontally, but is **not** a scroll container) so sticky pins to the document viewport. Plus the `<title>` minor. Making the 5 edits.
Luci
5 edits made. Restarting to clear Jinja template cache, then verifying sticky behavior live with a real browser scroll.
Luci
Restart didn't take (auth). Restarting with sudo explicitly and confirming a fresh process timestamp (templates are Jinja-cached — must be a new process).
Luci
New process 21:11:26. Title fixed, CSS served. Now the real test — does the header actually pin on scroll? Checking via browser-harness (scroll + measure header rect).
Luci
browser-harness wants `-c`. Writing the check to a file and invoking it.
Luci
Still broken — headers at top `-1211` after scrolling 1400 (not pinned). The `clip` approach didn't engage sticky. Need to inspect the *computed* styles + look for a containing-block breaker (transform/contain/filter) on an ancestor. Deeper diagnostic.
Luci
Key finding: computed styles still show **`hidden/auto`** on all four ancestors — my edits were **overridden**. Something with higher specificity (likely a `static/*.css`) sets these overflows. Hunting the override.
Luci
No external override — so the server serves my new CSS (urllib confirmed `clip`/`visible`), but the **browser got the service-worker-cached old `/board`**. Classic SW stale-cache (the `verify_cache_bust` trap). Re-testing with a cache-busting URL to see the real fix.
Luci
Cache-bust URL gave an empty panel (SW interference). Let me bypass the service worker entirely via CDP, then load fresh and measure — that isolates whether the CSS fix itself works.
Luci
CDP needs a session; simpler to unregister the SW + clear caches via JS, then reload fresh.
Luci
Browser-harness flaked (background, no output). Per circuit-breaker, switching approach: one clean foreground run — single tab, unregister SW, clear caches, hard-reload in place, measure.
Luci
Three browser-harness attempts hung on SW operations — circuit-breaker tripped. Server serves the correct CSS (urllib-confirmed); the browser keeps getting the SW-cached old page. Delegating the authoritative in-browser verification to Tessa (fresh context, owns browser-harness, mandated Gate-2) with explicit SW-bypass instructions.
Luci
Tessa: **STICKY FIXED** — all 5 headers `top=0` after scroll (pinned), fresh-CSS gate confirmed (`overflowY:visible`, `overflowX:clip`), title now "Board — Mission Control". Both outstanding issues resolved. (Background task bf84ne0bv was the earlier hung attempt — ignoring.)
Committing the verified fix + closing MC-4779.
Luci
Recovery complete. MC-4779 now genuinely done — live-verified, Tessa-approved.
## What actually happened
You re-ran the panel expecting the overhauled board. It wasn't there — **MC-4779 was falsely closed**. Not missing code: a **deploy gap**.
- Overhaul WAS committed to mission-control master, but `luci-dashboard` ran since **07:19** (pre-merge) and was **never restarted** → live served stale in-memory code. The "landed to master and live" claim was never checked against the running process.
- (Cross-repo gotcha: `mission-control` is its own nested git repo — the work was on *its* master, which first looked unmerged from the workspace repo.)
## Hand-recovery (you approved, gated)
1. **Restarted** luci-dashboard → deployed the on-disk overhaul. Live now: columns `Inbox · Active · Needs you · Blocked · Done`, Needs-you count 35 = DB 35, humanised waiting reasons, WAT hidden behind Advanced, History gone, version leak gone, sw.js board timeout = 18000 (no false-offline).
2. **Tessa Gate-2** caught 1 Major: sticky headers *still* scrolled away — worker added `position:sticky` but the unbounded `overflow:hidden/auto` chain made `.v2-kanban` a non-scrolling scroll-container. Fixed by converting the chain to `overflow:clip`.
3. **Tessa re-verified**: all 5 headers `top=0` on scroll (fresh CSS, SW cache bypassed). Title `Board v2`→`Board`.
Commit `2ca94b8` pushed. MC-4779 closed with full recovery log.
## Did NOT re-run the panel
Board was unchanged-old at session start → a fresh panel would've re-found this morning's bugs. Wasteful. Landed the work instead.
## Prevention note
The internal-Kanban worker reported done without verifying the **running process** — git-merged ≠ deployed. The dev-loop "done" gate should require a live-process check (restart + verify behavior), not just commit/push + an HTTP 200.
Want me to save that as a feedback memory + add a deploy-verification step to the dev-loop DONE gate?
Luci
Board overhaul now actually deployed + verified (MC-4779 landed, restart 21:11). So this panel run is the legit post-overhaul audit — board changed since the last real panel (which was pre-overhaul). Quick live-state confirm, then launch.
Luci
Live post-overhaul state confirmed: `Inbox · Active · Needs you (35) · Blocked (0) · Done`, sticky CSS present, title fixed. This is the legit audit of the new design. Launching Wave 1 — 6 parallel testers (review-only) against the overhauled board.
Luci
Wave 1 done — 6 agents on the **new** board. Verdict: overhaul largely succeeded (forensic: counts tie 35==35 / 29==29, no dropped/dup/orphan, age=last-activity verified, search works body+title+id all-statuses; UX: sticky headers pin `top=0` measured, AA contrast, filter-persist + Cmd+K + J/K all work). Next layer of findings below.
## New board audit — consolidated roadmap
### 🔴 HIGH (multiple agents, verified)
1. **Needs-you cards have NO actions** — `_V2_QUICK_ACTION_MAP` maps `waiting`/`needs_input`/`in_review` → `[]`, and CTA chips only fire for in_review/needs_input. So all 35 Elmar-owned cards offer *only* "open ticket" — yet the backend accepts `waiting→{todo,in_progress,cancelled}` (verified 200). The column surfaces what needs you but can't act on it. **Half the job.**
2. **Per-card `<style>` duplication = 87% of payload** — `board_v2_card.html` re-emits a 6.2KB `<style>` per card → 65 blocks/64 cards = 407KB of 469KB, re-parsed on every htmx swap/SSE tick. Hoist to the panel once. Real perf bug.
3. **3 of 5 columns empty in steady state** — Inbox 0 · Active 0 · Blocked 0 · Needs-you 35 · Done 29. It's a 2-column board in 5-column chrome; Active's lane apparatus renders nothing. (Refines the MC-4779 5-column choice → decision below.)
4. **Dead/stalled worker invisible** — `runtime_liveness` is computed but `board_v2_card.html` never renders it; legacy board had liveness dots, v2 dropped them. Luci can't spot a dead worker from the board. Regression.
### 🟡 MED
- **Double-"waiting" humanise bug** — ticket hero: "Waiting: **waiting on** elmar regulatory decision" (card path humanises; hero `pending_cta_label` ticket.html L683 doesn't). Route hero through `_v2_humanise_pending_state`.
- **Needs-you 35-deep, arbitrary order** — all share phase_rank 4; can't tell a regulatory deadline from an RSVP (decision below).
- **"Ctrl: ambient" controller dropdown leaks in ticket header** by default (L754) — move into Advanced.
- **Humaniser gaps** — `ess`→"ess", `rsvp`→"rsvp" (add to acronyms); `parked_backlog`→"Waiting: parked backlog" (wrong — parked ≠ waiting-on-Elmar; route to deferred, not Needs-you).
- **Search default caps at 50** — `/api/v1/tickets?q=` returns 50 of 2977 unless `&limit=` passed → Cmd+K misses matches. Raise default to ~200 when `q`.
- **4 runtime surfaces on ticket** (hero + mobile bar + sidebar Runtime status + Advanced console) — drop the sidebar dupe. **Dead Active lanes** (`waiting`/`blocked` lanes never render) — remove.
### 🟢 LOW (batch as quick wins)
Assignee chips raw lowercase (`|title`); legend flow line omits Blocked; mobile 5-seg switcher overflows + defaults to empty Inbox (default to first non-empty); "VIEW ALL DONE" overflows column; latent future-snooze open not filtered; legend popover overlaps Blocked; `/board-legacy` still linked.
Genuine decisions:
Luci
Decisions locked:
- **Full action set** on human-gate cards (Resume/Not-needed/Answer/Move-to-todo/Approve/Send-back)
- **Auto-collapse empty columns** to thin rails
- **Priority-then-age sort** in Needs-you
- **Bugs + approved decisions**, one ticket + dev-loop + Tessa
Writing spec → opening ticket.
Luci
**MC-4795** queued (`todo`, luci) → dispatcher spawns a dev-loop worker. I gate at completion (and this time the gate requires verify-against-running-process, SW bypassed — the MC-4779 lesson is baked into the ticket).
## Done
- **Post-overhaul panel** (6 agents) against the now-live MC-4779 board. Verdict: **the overhaul works** — forensic counts tie (35==35, 29==29), search/sticky/age/contrast/filter-persist all verified good.
- Found the **next layer**: 4 HIGH + MED/LOW. Headline — the Needs-you column surfaces 35 tickets but can't act on any of them.
- Decisions captured, spec written, ticket opened, routed to dev-loop.
## Links
Ticket: http://100.118.207.3:3001/ticket/MC-4795
Spec: http://100.118.207.3:3001/md-view?file=docs/plans/2026-06-08-board-overhaul-round2.md
## In MC-4795
**Approved:** full card quick-actions (Resume/Not-needed/Answer/Approve/Send-back) · auto-collapse empty columns to rails · Needs-you priority-then-age sort.
**Bugs/perf:** hoist per-card CSS (87% of payload) · liveness dot for dead workers · double-"waiting" humanise fix · search-50 cap · dead Active lanes · humaniser acronyms/parked · Ctrl-dropdown into Advanced · runtime-dupe · assignee case · legend flow · mobile default segment.
**Deferred:** bulk select+undo, theme sub-grouping, Blocked→"Stuck (system)", /board-legacy cleanup.
Worker check-in ~2h. Watch the board.
Note: still pending your call on the earlier offer — save the **"git-merged ≠ deployed; verify the running process"** lesson as a feedback memory + add a deploy-verify step to the dev-loop DONE gate? That false-done cost a full re-investigation today.
You
Yes improve.
Luci
On it — two improvements: (1) save the lesson as a feedback memory, (2) add a deploy-verify step to the dev-loop DONE gate. First locating the memory store + dev-loop DONE gates.
Luci
Line 208 already shows a "git-merged ≠ deployed" note — let me read the current DONE gates before editing (avoid duplicating).
Luci
Tier 2 (2.8) DONE already has the deploy gate. Checking Tier 3 (3.8) and the single-session DONE rule for parity.
Luci
Tier 3 (3.8) lacks the gate; Tier 2 has it. Reading the single-session DONE bullet (~L19) — the universal DONE — to add the gate there (covers all tiers) plus Tier 3 for parity.
Luci
Tier 2 has the gate; Tier 3 and the universal single-session DONE don't. Adding to both for full coverage.
Luci
Dev-loop gates done (universal + Tier 3; Tier 2 already had it). Now saving the feedback memory — locating the feedback_*.md convention.
Luci
Feedback memories: `~/workspace/PKA/Vault/memory/feedback_*.md`. Checking the format to match, then writing the new one.
Luci
Format clear. Writing the feedback memory.
Luci
Memory written. Adding it to the MEMORY.md index, then committing both repos.
Luci
Both edits done. Committing both repos (skill → `conrelma/claude`; memory → PKA repo).
Luci
Both improvements shipped + pushed:
## Done
1. **dev-loop DONE gate** (`~/.claude/skills/dev-loop/SKILL.md`, pushed to `conrelma/claude`) — added deploy-verify gate to the universal single-session DONE + Tier 3 §3.8 (Tier 2 §2.8 already had it). Rule: any long-running-service change requires **restart + live behavior check against the running process** (SW/CDN cache bypassed). HTTP 200 ≠ deployed.
2. **Feedback memory** (`feedback_git_merged_not_deployed.md`, pushed to PKA repo, indexed in MEMORY.md) — captures the MC-4779 false-done: git-merged ≠ deployed (stale gunicorn), the SW-cache verification trap, and the nested-repo gotcha.
Both reach Lucienne on next 15-min sync. Future workers now hit the gate before they can claim done on a service change.
You
what happened to the buttons now on the cards. Now there is just a Resume/Not Needed again. I'm sure we had more earlier, so it is like we don't commit our changes or something or we work on an old branch and then lose the changes we already made
Luci
Could be two very different things — either correct (all current tickets are `waiting`, so only the waiting-card actions show) or a real regression (MC-4795 work reverted/not deployed). Not guessing — verifying ground truth, applying the deploy-gate lesson. Checking ticket state, git history, the deployed code, and the running process.
Luci
MC-4795 IS landed + deployed (commits 9097240 + 23da978 on master HEAD, 232 tests passed, restarted). Not reverted, not an old branch — the changes are committed. So the question is whether the full action set is in the code but simply not *visible* because every current ticket is `waiting`. Checking the actual action map + live ticket statuses.
Luci
Found it. The deployed action map (commit on master HEAD, 232 tests passed, restarted ~04:48) is the **full set** — per status:
- `inbox` → Start / Not needed
- `waiting` → **Resume / Not needed**
- `needs_input` → **Answer / Move to todo**
- `in_review` → **Approve / Send back**
- `in_progress` / `blocked` / `done` → no buttons
Current open tickets: **8 inbox + 31 waiting, and ZERO needs_input / in_review**. So the board can *only* show Resume/Not-needed (waiting) and Start/Not-needed (inbox) right now — the Answer/Approve/Send-back buttons are status-specific and nothing is in those states. Verifying the running process actually serves this (deploy-gate, SW bypassed) before I conclude.
Luci
Verified — **nothing was lost.** The running process (started 08:39 today, active) serves the committed code. Live board right now: **Resume × 31, Start × 8, Not needed × 39** — exactly matching the current ticket mix.
## What's actually happening
MC-4795 landed properly this time (commits `9097240` + `23da978` on master HEAD, 232 tests passed, deployed — the deploy-gate worked). The buttons are **per-status by design** (the full set you approved):
| Ticket status | Buttons |
|---|---|
| `inbox` | Start · Not needed |
| `waiting` | **Resume · Not needed** |
| `needs_input` | **Answer · Move to todo** |
| `in_review` | **Approve · Send back** |
| in_progress / blocked / done | (none) |
Right now **all 39 open tickets are `waiting` (31) + `inbox` (8) — zero `needs_input`, zero `in_review`.** So the only buttons that *can* render are Resume/Not-needed and Start. The Answer/Approve/Send-back buttons aren't gone — they're in the deployed code (`_V2_QUICK_ACTION_MAP` app.py:2813, rendered by the loop at app.py:3026); they only appear when a ticket is in those states.
You likely saw the fuller set earlier during testing (a ticket sitting in `in_review`/`needs_input`), or in the spec mockup.
## Your instinct wasn't wrong in general
That "we lose changes / old branch" failure mode is exactly MC-4779 last night. This is **not** that — I checked: committed, on HEAD, running process serves it. The deploy-gate I just added is why MC-4795's worker actually restarted + verified.
Want hard proof the other buttons work? I can flip one throwaway ticket to `in_review` for ~10s, screenshot the Approve/Send-back buttons, then revert it. Say the word and I'll demo it.
You
Test Canary skill on MC board
Luci
"Canary skill" — not in my loaded skill list. Checking if it exists (active, retired, or auto) before assuming.
Luci
Found it — `~/.claude/skills/canary/` exists (+ `scripts/mc_canary.py`). Reading the skill to run it correctly.
Luci
SKILL.md is a broken symlink to a Mac path (`/Users/elmar/...`) — Luci has the `.tmpl` instead. Reading that.
Luci
`canary` = post-deploy visual monitor (gstack skill). Two problems on Luci: its `SKILL.md` is a broken symlink to a Mac path (`/Users/elmar/...`), and it drives the gstack `$B` browse daemon which isn't on Luci. So I'll run the canary's **intent** — a quick single-pass health check on the live MC board — using browser-harness (Luci's browse engine), and flag the skill's portability gap.
Step 1 — HTTP health (reliable, SW-bypassing).
You
Stop not gstack
Luci
HTTP all green (200, fast, no offline fallback). Step 2 — browser pass: SW-bypassed screenshots + runtime console-error capture on the board + a ticket.
Luci
Stopping the gstack/browser approach. You mean `scripts/mc_canary.py` — the MC-specific canary, not the gstack skill. Reading it.
Luci
That's the real one — `mc_canary.py` (MC-804): end-to-end API canary. Creates a test ticket → lists it → route-lookup → comment → cleanup(done), times each step vs baseline, alerts on regression. Runs it now (safe — self-cleans).