Ticket is done; runtime is closed.·cwd /home/lucienne/workspace/state/control-room-worktrees/mc-4634-watchdog-failed-weekly-deep-research-202-a858dd · uptime 9d 12h · last activity 9d 12h ago
Description
MC-4634
Task `weekly-deep-research` finished with status **failed** at 2026-06-03 05:07:00.
Error (if captured):
```
ro exit status 1.
[pick] fleet-watch (Narrowbody fleet watch) angle#0
[pick] description: This week's lead angle: 737 MAX A320neo delivery delay backlog in the last 3 months. Also cover: CFM LEAP engine production durability fix in the last 3 months; Pratt Whitney GTF powder metal inspection groundings in the last 3 months; spare engine shortage lease rate narrowbody in the last 3 months. Framing — for a SA low-cost airline running a 737/A320 fleet — what does this mean for delivery slots, engine availability (LEAP/GTF), spare-engine cost, MRO turnaround, lease rates and effective capacity? Quantify AOG and cost impact where sources give numbers.. Emphasise developments and analysis from the last 3 months and bring a fresh angle distinct from a generic overview.
[pipeline] /usr/bin/python
```
This may be a one-off or a recurring issue — check previous runs in mc.db `task_runs` table.
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· High
· Luci
▼
SState
Done
Closed
PPeople
TTiming / Details▼
scheduler-watchdog (scheduled)
infra
10d ago
9d ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.direct_worker_done_recoveredcwd /home/lucienne/workspace/state/control-room-worktrees/mc-4634-watchdog-failed-weekly-deep-research-202-a858dd · uptime 9d 12h · last activity 9d 12h agoMC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.Raw console: luci-controller · claude-code
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
Root cause of weekly-deep-research failure: NotebookLM Google session expired. storage_state.json (Jun 2) now redirects to Google sign-in; pipeline dies at Phase 3 nblm_create with "Authentication expired or invalid". This is an auth wall — I cannot type Google creds. Re-auth needed via VNC (100.118.207.3:5901, display :1): open terminal, run `notebooklm login`, complete Google sign-in. This recurs ~weekly and silently kills every Wednesday deep-research dive. After you re-auth, I'll re-run weekly_cycle.py --topic fleet-watch to recover this week's missed dive and confirm green. Can you re-auth via VNC?
luci10d ago
Luci picking up this ticket (background worker).
luci10d ago
Ticket picked up by Luci via MC dispatcher.
MC-4634: [watchdog:failed:weekly-deep-research:2026-06-03 05] Task failed: weekly-deep-research (failed)
Work this ticket in the live tmux runtime. Use DONE:, REVIEW:, or QUESTION: when you need MC to reflect the next state.
api10d ago
Controller: confirmed genuine auth wall (NBLM Google session expired — non-coding, cannot type creds). Consolidated with MC-4634/4635 (same root cause). Escalated to Elmar with the re-auth path (Auth Portal http://100.118.207.3:8788 → NotebookLM → Re-authenticate → Save Cookies; capture-hang fixed by MC-4637). Holding in needs_input until Elmar re-auths; then re-run the task. Not a code fix.
RUNTIME TERMINAL STATE (MC-3482 contract)
status: warning
summary: Live ticket runtime timed out without a harvestable verdict; parked for automatic runtime recovery.
root_cause: harvest timeout — no DONE/REVIEW/QUESTION verdict; write-back preserved substantive transcript tail
safe_retry: Runtime was killed and the ticket was returned to todo for a fresh Luci-owned retry; no human input is needed.
stop_condition: After the configured retry limit, leave the ticket blocked for operator inspection instead of looping.
human_input_required: no
next_actions:
- Pickup will start a fresh runtime from MC history.
- Inspect the pane log artifact only if the retry fails again.
artifacts:
- ticket:MC-4634
- pane_log:/home/lucienne/workspace/logs/mc-MC-4634.tmux.log
api10d ago
Council (Codex/Gemini/GLM/Opus; Kimi timed out — 4/5 = valid quorum).
Root cause: reconcile_background_tasks() — the path ALL background tasks take — had NO permanent-auth detection. NBLM (browser-session) auth expiry only got a swallowable soft notify + a 'high' ticket after 3 consecutive fails (~3 weeks for a weekly task). The immediate critical needs_input/elmar + forced-Telegram path existed only in the foreground run_task() handler, which background tasks never reach. Systemic bug behind the MC-4633/4634/4635 cluster.
Fix (commit 0b71b90d): classify NBLM expiry as permanent-auth; recognise the NotebookLM provider; provider-aware fix hint (notebooklm login via VNC vs OAuth portal); single _alert_permanent_auth_failure() helper shared by both paths; reconcile now alerts on the FIRST permanent-auth bg failure and does not suspend (auto-recovers after re-auth).
Council consensus issue (accepted): bg path scanned stderr+stdout unconditionally — a long stdout report quoting '401/403/notebooklm login' as content could misfire a critical alert. Fixed by mirroring the foreground classifier: stderr-first AND gated behind _is_external_runtime_failure. Added a regression test proving benign stdout auth-strings are NOT escalated.
Rejected after verifying against code: GLM 'sidecars not reset' (reconcile unlinks them + flips status off running) and Codex 'ticket throw suppresses Telegram' (create_mc_ticket is try/except:pass).
Follow-up (separate ticket, not this one): open-ticket dedup + Telegram cooldown for sub-daily permanent-auth tasks.
Tests: 26/26 pass (6 new). Non-UI → no Tessa.
NOTE: this fix makes the recurrence loud + correct. The task itself still needs Elmar to re-authenticate NotebookLM via VNC (credential wall, already escalated) before weekly-deep-research runs green.
luci-controller10d ago
[control-room-dispatch] Control Room dispatched MC-4634 to a Claude Code worker.
Worktree: /home/lucienne/workspace/state/control-room-worktrees/mc-4634-watchdog-failed-weekly-deep-research-202-a858dd
Branch: cr/mc-4634-watchdog-failed-weekly-deep-research-202-a858dd
tmux: cr-MC-4634
Expected check-in: 2026-06-03T18:40:37.978339+00:00
luci-controller10d ago
[control-room-recover] MC-4634: recovered DONE from a finished direct worker whose tmux was gone. Ticket → in_review. Summary: MC-4634 — background-task permanent-auth alerting fixed. branch cr/mc-4634-watchdog-failed-weekly-deep-research-202-a858dd @ 0b71b90d (pushed); scheduler.py + tests/test_scheduler_regressions.py; 26/26 tests; council 4/5 (one consensus FP-scoping fix applied + regression test added). NBLM re-auth via VNC still required for the task to run green (credential wall, already escalated).