Shadow-review false-fails cross-branch/cross-repo/non-code work and bounces orchestrator-closed tickets
ROOT CAUSE of today's (2026-05-28) ticket bounce-storm. The shadow reviewer (mc_shadow_review.py) resolves the diff from the MC worktree branch only. When it cannot see the real...
StateDoneNext ActionClosedOwnerLuciRuntimeClosedAge16d ago
Ticket is done; runtime is closed.·profile claude_opus_1m_high · cwd /home/lucienne/workspace/mission-control · uptime 16d 0h · last activity 15d 18h ago
Description
MC-4371
ROOT CAUSE of today's (2026-05-28) ticket bounce-storm. The shadow reviewer (mc_shadow_review.py) resolves the diff from the MC worktree branch only. When it cannot see the real diff it returns verdict=fail/uncertain, and that false verdict BOUNCES orchestrator-gated tickets back open (done->todo / done->needs_input), causing pickup to re-dispatch workers onto already-complete work — a wasteful loop.
False-fail classes observed today (all verified-real work):
1. SLOT-BRANCH blindness — work committed on slot2/mc-MC-4327 (MC-4350/4348/4345): diff_source=unavailable -> verdict=fail.
2. CROSS-REPO blindness — engine committed to the ~/.claude skills git, not the MC worktree (MC-4365 radio curation: afea8fc/0187d89): invisible to the MC diff check.
3. NON-CODE deliverables — research/triage tickets (MC-4358 agent-watch, MC-4362 macro research): no diff expected, but code-review criteria applied -> fail.
4. WRONG-REPO commit existence — claimed e05be7cf "not in any branch" (MC-4335) when it is on master.
FIX (proposed):
- Resolve the diff from the ticket's ACTUAL branch (slot{N}/mc-MC-<id>) and any referenced repo, not just the MC worktree/master.
- When diff_source is genuinely unavailable, ABSTAIN (advisory comment, low confidence) — do NOT emit verdict=fail and do NOT change a status the orchestrator set. The orchestrator gate is authoritative (3-role model: controller gates).
- Detect non-code/research tickets and gate on artifact existence, not code diff.
- Never auto-reopen a ticket that an orchestrator/operator moved to done.
Impact: eliminates the recurring re-dispatch-onto-done-work loop + the per-cycle worker-review nag for already-gated tickets.
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· Critical
· Luci
▼
SState
Done
Closed
PPeople
TTiming / Details▼
api (human)
Mission Control
16d ago
16d ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.operator_cancelled_main_checkout_hazardprofile claude_opus_1m_high · cwd /home/lucienne/workspace/mission-control · uptime 16d 0h · last activity 15d 18h agoMC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.Raw console: luci · claude/anthropic/opus[1m]
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
Related hygiene gap (same churn family): stale runtime_sessions rows are NOT being reaped. As of 09:41 there are ~9 rows with status='stale' whose tmux_target sessions no longer exist (ticket:4324/4345/4350/4335/4317/4322/4318/4286). Dead-but-unreaped runtime rows contribute to: (a) 409-on-spawn collisions when a ticket re-dispatches against a session key the stale row still holds, and (b) confusing controller-repair churn. The orphan reaper appears to cover task_runs but not runtime_sessions. Add runtime_sessions reaping (stale + dead-tmux -> closed) to close this. Bundle with the shadow-review fix or note as a sibling under MC-4357.
luci16d ago
Luci picking up this ticket (background worker).
luci16d ago
Ticket picked up by Luci via MC dispatcher.
MC-4371: Shadow-review false-fails cross-branch/cross-repo/non-code work and bounces orchestrator-closed tickets
Work this ticket in the live tmux runtime. Use DONE:, REVIEW:, or QUESTION: when you need MC to reflect the next state.
Luci16d ago
MC-4371: shadow-review now resolves done_sha cross-repo (MC/workspace/.claude + slot branches), ABSTAINS instead of failing when work is committed-but-unresolvable or non-code, and never auto-reopens an orchestrator/operator-gated done. Committed 2480c00 + pushed on slot2 batch; +19 tests, council-reviewed.
luci16d ago
Workflow action `review_ready` applied. Work ready for review.
api16d ago
Gate: fix VERIFIED SOUND — 2480c00 (slot2/mc-MC-4327, pushed). Implements the spec exactly: _ticket_diff resolves done_sha across MC + ~/workspace (slot branches) + ~/.claude; ABSTAIN (logs 'skipped' + low-conf advisory, no retry) when diff unresolvable/non-code; never auto-reopens an orchestrator/operator-gated done. 49 regression tests pass (test_mc4207_shadow_review + test_mc4291_review_retry).
NOT independently cherry-pickable to master: touches models.py + ticket_runtime.py + test_mc4207 (shared with the Home cluster) — conflicts. So it lands with the holistic slot2 -> master merge. HIGHEST-priority item in that merge: it is the bounce-storm killer.
Note: a separate reviewer fault persists — shadow_reviews row 10:20 = 'uncertain 0.0 reviewer returned non-JSON'. That is an LLM-output error in the reviewer backend, NOT the diff-blindness this fix addresses. Flag for follow-up if it recurs. Keeping in_review (closes at the slot2 merge).
api16d ago
ESCALATED to critical — now causing ACTIVE WORK LOSS, not just review noise.
Evidence (2026-05-28 ~14:0x): multiple workers are editing the SAME live mission-control checkout (~/workspace/mission-control, on slot2) concurrently — MC-4357 (harvest fix), MC-4378, MC-4383 (test_mc4383_artifact_gate.py appeared mid-operation). They race each other + the orchestrator. The MC-4357 keystone fix was reset out of the working tree and only survived because a worker stashed it (stash@{0}) — a near-loss of critical work.
The commit-guard is no longer just hygiene. Workers MUST operate in pool slots (~/workspace/.claude/worktrees/pool-N), NEVER the main serving checkout. Root enabler: the main mission-control checkout is parked on slot2/mc-MC-4327 (not master) and workers are landing there. Fix needs: (1) git-layer guard rejecting commits/edits in the main serving checkout, (2) ensure mission-control ticket workers claim pool slots, (3) return the main checkout to master (MC-4357 reconciliation).
DO NOT auto-reset/checkout the live mission-control tree while workers are active — that destroys in-flight work (I stopped short of this for exactly that reason).
luci16d ago
Operator mitigation + dispatch: closed stale chat runtime `mc-chat-473` that was sitting in an interactive menu with cwd=/home/lucienne/workspace/mission-control; it was a main-checkout edit hazard. Reclaimed stale pool slot-0 from completed MC-4220.
Returning this ticket to `todo` as the canonical root fix for the remaining control-plane hygiene problem: workers/chat runtimes must not be able to edit the main serving checkout, mission-control ticket work must claim pool slots, and the main checkout must be reconciled back to master only after MC-4357/MC-4383 WIP is safely preserved/landed. Acceptance: add guard/tests, prove fresh pickup lands in pool slot, prove stale main-checkout chat/runtime is closed or read-only, then reconcile MC-4357 stash safely.
luci16d ago
Luci picking up this ticket (background worker).
luci16d ago
[failed_to_inject] runtime_busy: Ticket runtime is not ready for input (status=running). Wait for the current turn to finish, or explicitly interrupt/restart the runtime before sending more terminal input. This message
Ticket picked up by Luci via MC dispatcher.
MC-4371: Shadow-review false-fails cross-branch/cross-repo/non-code work and bounces orchestrator-closed tickets
Work this ticket in the live tmux runtime. Use DONE:, REVIEW:, or QUESTION: when you need MC to reflect the next state.
luci16d ago
[failed_to_inject] runtime_busy: Ticket runtime is not ready for input (status=running). Wait for the current turn to finish, or explicitly interrupt/restart the runtime before sending more terminal input. This message
Ticket picked up by Luci via MC dispatcher.
MC-4371: Shadow-review false-fails cross-branch/cross-repo/non-code work and bounces orchestrator-closed tickets
Work this ticket in the live tmux runtime. Use DONE:, REVIEW:, or QUESTION: when you need MC to reflect the next state.
luci16d ago
Ticket runtime send failed while dispatching.
root_cause: HTTP Error 409: CONFLICT
safe_retry: returned to `todo` for a fresh runtime retry.
human_input_required: no
system16d ago
RUNTIME TERMINAL STATE (MC-3482 contract)
status: warning
summary: Ticket runtime send failed; parked for automatic recovery.
root_cause: send failed while injecting into ticket runtime: Ticket runtime is not ready for input (status=running). Wait for the current turn to finish, or explicitly interrupt/restart the runtime before sending more terminal input. This message was not sent.
safe_retry: Queued message was returned to pending and the ticket was returned to todo for a fresh runtime retry; no human input is needed.
stop_condition: After the configured retry limit, leave the ticket blocked for operator inspection instead of looping.
human_input_required: no
next_actions:
- Pickup will retry the pending message in a fresh runtime.
- Inspect runtime/send only if the retry fails again.
artifacts:
- ticket:MC-4371
luci16d ago
Luci picking up this ticket (background worker).
luci16d ago
Ticket runtime send failed while dispatching.
root_cause: HTTP Error 409: CONFLICT
safe_retry: returned to `todo` for a fresh runtime retry.
human_input_required: no
system16d ago
RUNTIME TERMINAL STATE (MC-3482 contract)
status: error
summary: Ticket runtime send failed; parked for automatic recovery.
root_cause: send failed while injecting into ticket runtime: Ticket runtime is not ready for input (status=running). Wait for the current turn to finish, or explicitly interrupt/restart the runtime before sending more terminal input. This message was not sent.
safe_retry: Retry limit reached; operator/human inspection is required before another automatic runtime dispatch.
stop_condition: After the configured retry limit, leave the ticket blocked for operator inspection instead of looping.
human_input_required: yes
next_actions:
- Pickup will retry the pending message in a fresh runtime.
- Inspect runtime/send only if the retry fails again.
artifacts:
- ticket:MC-4371
luci16d ago
Safety correction: the re-dispatch reproduced the bug — MC started `mc-MC-4371` with cwd=/home/lucienne/workspace/mission-control instead of a pool slot, then hit runtime_busy/409. I immediately killed that unsafe runtime and marked it cancelled. I am not letting scheduler/pickup retry this ticket through the broken path; next attempt must be manual/safe-worktree or after the pickup cwd guard is fixed.
luci16d ago
Controller gate update: validated the pushed MC-4371 safe branches and landed the top-level workspace half. Evidence: `/home/lucienne/workspace/state/manual-worktrees/mc-4371-workspace` commit e2a26abd (worktree_pool singleton by repo) passed `python3 -m pytest tests/test_mc_pickup_pool_validation.py -q` (6 passed); fast-forwarded `/home/lucienne/workspace` to e2a26abd and pushed origin/master; reran the same deployed-root test (6 passed). Mission-Control half remains pending: `/home/lucienne/workspace/state/manual-worktrees/mc-4371-mission-control` commit 1f5e081 passed `tests/test_mc4371_runtime_isolation.py tests/test_mc4207_shadow_review.py tests/test_mc4291_review_retry.py` (53 passed) and `tests/test_ticket_runtime.py` (73 passed), but the live `/home/lucienne/workspace/mission-control` checkout has uncommitted MC-4383 WIP touching ticket_runtime.py, so I did not cherry-pick/restart over it. No Elmar input needed; next controller step is preserve/reconcile that WIP, then land 1f5e081 and restart/smoke MC.
luci16d ago
Controller normalization: this remains intentionally parked while MC-4383 reconciles the dirty mission-control checkout WIP that blocks safe landing/restart. MC-4383 has been released to todo for pickup. No Elmar input required.
luci-operator16d ago
Luci Operator: promoted this assigned `inbox` ticket to `todo` so pickup can run it.
luci16d ago
New live repro from MC-4381: shadow-review/control-plane still produced a false advisory FAIL after a controller-verified cross-repo workspace commit.
Facts:
- MC-4381's real work landed on /home/lucienne/workspace origin/master at 64926ae3.
- The advisory FAIL inspected the mission-control cwd/slot done_sha eb4de42 and concluded no task files changed.
- This made a done ticket look failed to Elmar and undermines the MC orchestration gate.
Controller decision: this ticket remains the canonical fix; priority raised to critical. Acceptance must include landing/enabling the cross-repo done_sha resolver on the live path and preventing post-close advisory FAIL comments from contradicting an authoritative controller gate without opening a clear follow-up.
luci16d ago
Controller normalization after full pending audit: Canonical root blocker for controller/operator churn: pickup/shadow-review must stop unsafe main-checkout dispatch and false review bounces. Finish this before resuming MC-4357/slot2 merge.
luci-controller16d ago
Controller dependency update: MC-4383 is now verified done on pushed branch feat/mc-4383 (a99812c). MC-4371 is no longer blocked by unreconciled MC-4383 WIP. It remains parked for controller-owned landing of the mission-control half because the live mission-control checkout is still the slot2 branch with dirty in-flight WIP; do not auto-dispatch this through pickup. Next controller action is to preserve/identify the dirty live checkout state, then land/restart/smoke the already-verified MC-4371 mission-control commit 1f5e081 safely. No Elmar input required.
Controller decision
luci-controller16d ago
Controller gate/landing complete: preserved the dirty live mission-control checkout as artifact /home/lucienne/workspace/ticket-attachments/MC-4371/live-dirty-before-mc4371-land-20260528T164001+0200 and git stash `stash@{0}` (`MC-4371 preserve live dirty WIP before landing 20260528T164001`), then landed verified MC-4371 mission-control commit 1f5e081 onto live slot2 as f72fdd5 and pushed origin/slot2/mc-MC-4327. Validation: `python3 -m pytest tests/test_mc4371_runtime_isolation.py tests/test_mc4207_shadow_review.py tests/test_mc4291_review_retry.py tests/test_ticket_runtime.py -q` => 126 passed; restarted luci-dashboard with sudo systemctl and smoked /, /board, and /ticket/4371 => HTTP 200. No Elmar input required.