Luci Agent Assessment — WAT Architecture Review

type: report tags: [assessment, luci, mission-control, wat, architecture] created: 2026-05-13 status: active

Date: 2026-05-13 Scope: Luci's current agent setup reviewed against MC Runtime Architecture, Runtime Workbench PRD, and industry WAT framework (Workflows, Agents, Tools) best practices Analysts: Architecture Analyst, Gap Analyst, Industry Benchmark Analyst (specialist subagents) Synthesised by: Lucienne

Executive Summary

Luci/Mission Control is a mature Level 3 (Workflow-driven) agentic system — well beyond one-shot prompting, with a complete ticket lifecycle, six defined agent roles, tmux-backed persistent runtimes, two-tier memory architecture, and mobile control surfaces. It has several areas where it leads the commercial framework landscape (memory architecture, human-in-the-loop, cost-aware profiles, mobile control, runtime persistence) and several where it trails (conditional workflow routing, structured output protocol, agent-to-agent handoff, observability).

Overall WAT compliance: 7.5/10. The architecture doc acceptance tests pass at 70%, with no hard failures. The biggest opportunities are conditional routing (LangGraph-inspired) and structured agent handoffs (CrewAI-inspired).

Key finding: The current architecture is sound and should NOT be rewritten into an external framework. The advantages of the home-grown approach (tmux runtime, real production data, cost awareness) outweigh the advantages of off-the-shelf frameworks. Targeted feature adoption is the right strategy.

1. Architecture Doc Compliance — 7/10 PASS

Acceptance Test Results

#	Test	Status
A1	Full ticket lifecycle without leaving MC	✅ PASS
A2	Workbench full control surface	✅ PASS
A3	Durable send (History first, then tmux)	✅ PASS
A4	History scroll-preservation during polling	⚠️ PARTIAL
A5	Runtime feed follow-bottom with scroll-up override	✅ PASS
A6	QUESTION: warm runtime + needs_input	✅ PASS
A7	Root DONE: review-ready keeps runtime	✅ PASS
A8	Send to Larry: pickupable	✅ PASS
A9	Return for fixes: requeue implementation	✅ PASS
A10	Workflow history visible without child tickets	⚠️ PARTIAL

7/10 PASS, 2/10 PARTIAL, 0/10 FAIL. The two partials are scroll-preservation edge cases and child-ticket navigation depth — both fixable without architecture changes.

2. WAT Framework Scores

Layer	Score	Assessment
Workflows (W)	7/10	Strong sequential template; lacks conditional routing, parallel fan-out, iterative loops. Single template (`dev_review_qa`).
Agents (A)	8/10	Six well-defined roles with clear purposes and tool access control. Gaps: no agent-to-agent handoff protocol, no capability registry.
Tools (T)	7/10	Rich tool surface (tmux, CLI, API, browser, Telegram, scheduler). Gaps: no tool discovery, no result caching, limited composition.

3. What Luci Does Better Than Frameworks

These are genuine competitive advantages. Don't lose them.

Advantage	vs Industry	Evidence
Production-grade ticket lifecycle	+2	891 tickets, 18K messages, 39K task runs. Frameworks assume you build this.
Persistent, inspectable runtimes	+2	tmux-backed. SSH in and see exactly what's happening. No framework offers this.
Two-tier memory architecture	+2	Episodic (mc.db) + semantic (vault.db) with dream cycle promotion and salience re-ranking.
Mobile-first control surface	+1	Telegram bridge + responsive Workbench. Frameworks are API-only.
Cost-aware runtime profiles	+1	Explicit tool policy: cheap API for tool-less work, CLI for code.

4. What Luci Does Worse Than Frameworks

These are the gaps to close.

Gap	vs Industry	Fix	Effort
No conditional workflow routing	-2	Add phase conditions to workflow templates (if pass → advance, if fail → return_to_fixes)	Medium
No structured output protocol	-2	Adopt JSON-mode output alongside sentinel-based harvest	High
No agent-to-agent handoff protocol	-2	Extend continuity pack for role transitions (changed_files, test_results, findings)	Medium
No observability framework	-1	Structured workflow events (JSONL per workflow step)	Low-Medium
No dynamic agent composition	-1	Defer — explicitly against design principles ("Luci as default, explicit routing")	N/A

5. PRD User Journey Gaps

Journey	Status	Gap
1. Plan with Luci	✅ Complete	—
2. Do it now	✅ Complete	—
3. Manage active ticket	⚠️ Partial	No AI-driven "Next Action" strip; attachment flow limited
4. Recover runtime	⚠️ Partial	No one-click "Retry as Todo"; provider failure recovery not self-service
5. WAT review gate	⚠️ Partial	No mandatory review policy; findings don't auto-route to fixes
6. Raw session escape hatch	✅ Complete	—

6. Roadmap Gap Analysis

The architecture doc lists 10 roadmap items. Current status:

#	Item	Status	Recommendation
1	Mandatory WAT review policy	⬜ Not started	P0 — define when council/review is required
2	Role-specific runtime profile overrides	⬜ Not started	P1 — per-phase profile selection
3	Cheaper API profiles for triage/summaries	⬜ Not started	P1 — tool-less task routing
4	Kimi CLI smoke checks	⬜ Not started	P2 — expand provider coverage
5	Agent Registry / supervisor view	⬜ Not started	P2 — operational visibility
6	Bounded supervisor check-ins	⬜ Not started	P2 — active pane summaries
7	Runtime archive on close	⬜ Not started	P1 — audit trail completeness
8	Telegram per-target queueing	⬜ Not started	P2 — rapid message handling
9	Pre-deploy regression command	⬜ Not started	P1 — release safety
10	Workflow/council hooks	⬜ Not started	P1 — second-opinion integration

New items identified by this review:

#	New Item	Recommendation
N1	Conditional workflow routing	P0 — single biggest WAT maturity improvement
N2	Structured agent handoff packs	P1 — reduce context loss between roles
N3	Multiple workflow templates (simple_fix, research_only, security_audit)	P1 — right-size the workflow to the task
N4	Structured workflow events (observability)	P2 — JSONL per step
N5	Next Action intelligence in Workbench	P2 — PRD requirement, user experience

7. Prioritised Recommendations

P0 — Do Next (High Impact, Aligns with Architecture)

#	Action	Why	Effort
R1	Conditional workflow routing	Close the biggest WAT maturity gap. Pass/fail branching on council review, Tessa validation, and Larry implementation.	Medium
R2	Mandatory WAT review policy	Architecture doc Roadmap #1. Define when council/review is required vs optional.	Low

P1 — Do Soon (High Value, Build on Foundation)

#	Action	Why	Effort
R3	Structured agent handoff packs	Extend `build_continuity_pack` to role transitions. Larry → Council should carry changed_files + diff summary.	Medium
R4	Multiple workflow templates	`simple_fix` (Larry → done), `research_only` (Scott → done), `security_audit` (security + reviewer → done). Right-size the pipeline.	Low
R5	Formal workflow state machine	Allowed transitions prevent invalid states. Currently implicit in status + action history.	Medium
R6	Runtime archive on close	Transcript excerpt, changed files, commands run, cost, close reason. Audit trail.	Low
R7	Pre-deploy regression command	Combined runtime + workflow + Telegram + scheduler smoke test. Release safety.	Medium

P2 — Plan For (Good Value, Higher Effort)

#	Action	Why	Effort
R8	Agent Registry view	Queryable capabilities, status, tool access. Enables supervisor pattern.	Medium
R9	Structured workflow events	JSONL per workflow step. Observability without heavy framework.	Low-Medium
R10	Next Action intelligence	PRD requirement. Derive next step from workflow state + recent activity.	Medium
R11	Telegram per-target queueing	Architecture doc Roadmap #8.	Medium
R12	Role-specific runtime profiles	Per-phase profile overrides. Tessa gets browser-enabled profile, Council gets API profile.	Medium

P3 — Explore (Strategic, Longer Horizon)

#	Action	Why	Effort
R13	Structured output protocol	JSON-mode output to complement sentinel-based harvest. Start with new agents.	High
R14	Kimi CLI smoke checks	Expand provider coverage.	Medium

Defer (Against Design Principles)

#	Action	Why Defer
D1	Dynamic agent composition	Architecture doc: "target is not 'many agents by default'"
D2	In-process agent runtime	tmux-backed is a strength, not a weakness
D3	Framework migration (LangGraph/CrewAI)	Would lose production advantages

8. Maturity Model

Level	Description	Status
L1	Single agent, one-shot tasks	✅ Exceeded
L2	Multi-agent, manual dispatch	✅ Exceeded
L3	Workflow-driven, phased, state tracking	✅ Current
L4	Conditional routing, branching on results	⬜ Next target
L5	Self-orchestrating, agents decide workflow	⬜ Explicitly deferred

The system should target Level 4. That means conditional routing (P0), multiple templates (P1), and structured handoffs (P1). Level 5 is the right thing to defer — the design philosophy of "Luci as default, explicit routing" is correct for a system where Elmar is the primary user and wants to stay in control.

9. Conclusion

Luci/Mission Control is a well-architected agentic system that's been refined through real production use (891 tickets, 18K messages, 39K task runs). It has genuine advantages over commercial frameworks in memory architecture, human-in-the-loop control, cost awareness, and runtime persistence.

The primary gap is workflow maturity — a single sequential template with manual dispatch is appropriate for Level 3 but insufficient for Level 4. Adding conditional routing, multiple templates, and structured agent handoffs would close this gap without requiring an architecture rewrite.

The 10-item roadmap from the architecture doc remains valid and should be executed in priority order, supplemented by the 5 new items identified here. No roadmap items should be dropped; several should be promoted in priority.

Report generated by Lucienne with specialist subagent analysis. Individual subagent reports available at /tmp/luci-arch-assessment.md, /tmp/luci-gap-assessment.md, /tmp/luci-benchmark-assessment.md.