⌂ Home ☷ Board

PRD: Mission Control Multi-Agent Orchestration Architecture

Owner: Elmar / Luci Date: 2026-06-01 Status: Draft for independent council planning

1. Problem

We need a reliable architecture for orchestrating a multi-agent software/project workflow using Mission Control (MC), Hermes/Luci, and multiple CLI/LLM runtimes such as Claude Code, Codex CLI, Gemini CLI, Kimi/GLM, and Hermes profiles.

The current confusion is around where the “controller” lives, what should be persistent, what should be scripted, and how work automatically advances through planning, implementation, review, QA, testing, and validation without Elmar having to manually ask for follow-up.

2. Background

Elmar previously used Lucienne’s older worker orchestration model with multiple agent profiles. That system worked well because when an agent finished a stage, Lucienne/orchestration knew immediately and handed the work to the next stage. Elmar did not need to ask “what is going on?” or “how far are you?”

Mission Control currently has tickets, comments, runtime sessions, tmux-backed interactive runtimes, scheduler/watchdog scripts, and multiple possible runtimes. However, recent Control Room discussions exposed uncertainty about the right architecture:

3. Current Assets / Constraints

Existing assets

Constraints

4. Goals

  1. Define a clear mental model for:
  2. Mission Control
  3. Luci / Hermes
  4. per-ticket controllers, if any
  5. global supervisor, if any
  6. runtimes/workers
  7. workflow gates
  8. scripts/watchdogs

  9. Define the lifecycle for a ticket from creation to Done.

  10. Define how a completed stage automatically triggers the next stage.

  11. Define when to use:

  12. direct controller work
  13. short-lived subagents
  14. persistent builder sessions
  15. ephemeral reviewers/testers
  16. deterministic scripts

  17. Define how Hermes profiles and CLI runtimes should be used.

  18. Define how to preserve interactive continuity with Claude/Codex/Gemini/etc. without overusing resources.

  19. Define a safe incremental implementation plan.

5. Non-Goals

6. Required User Experience

Ticket interaction

Elmar should be able to open/comment on a ticket and expect:

Stage handoff

When a worker/stage finishes:

Resource usage

The system should avoid unnecessary live sessions. It should distinguish durable state from live processes.

7. Questions for Independent Plan Authors

Please propose a fresh architecture from scratch. Do not assume a specific answer.

Your plan should answer:

  1. What is the controller?
  2. Is there one controller globally, one per ticket, or both?
  3. Which parts are scripts/state machines vs LLM agents?
  4. Where does Hermes fit?
  5. Where do Claude Code, Codex CLI, Gemini CLI, Kimi/GLM fit?
  6. What should be persistent vs ephemeral?
  7. How does a ticket advance from plan → build → review → test → validate → done?
  8. How does Elmar interact with the system?
  9. How are stuck tickets detected and recovered?
  10. How would you implement this incrementally with least risk?

8. Expected Output From Council Member

Return a plan with: