PKA · Design Spec v5 · DRAFT

SecondBrain Architecture

Splitting the human capture surface from the LLM memory index — open flavor set, source back-links, save-once history, and closing the reports gap.

2026-05-21 · author: Lucienne · status: for Elmar review · supersedes optimized-conjuring-church.md (v4)

0The one-sentence model

“SecondBrain” is not a folder — it is the vault.db index plus the compiled Brain pages. Obsidian is just one window onto a small, human-friendly corner of it. The index spans the Obsidian vault and raw stores outside it.

1What went wrong

Mobile Obsidian stalled on indexing. Root cause confirmed: the email pipeline + sources-sync wrote multi-megabyte Markdown dumps (Excel→MD via markitdown, PDF→MD via gemini-vision) into SecondBrain/sources/. Six finance attachments alone were 5.85–7.91 MB each; the parser chokes on them.

key finding

Those email-attachment dumps were already excluded from vault.db (index.py lines 115–119). Not indexed → invisible to the compiler → they served zero purpose. 53 MB of pure dead weight. Moving them out (done 2026-05-20, dot-prefixed to .email-attachments/) lost nothing.

2The decoupling insight

Two indexes were being conflated:

System	Indexes what	Drives
Obsidian	Every file in the vault folder (can't selectively exclude beyond dotfiles)	Mobile stall — and your browsing, which you said you barely do
vault.db	Whatever `index.py` scan-roots point at — independent of Obsidian	What Lucienne can retrieve when you ask

So a file can be indexed for the LLM without living in Obsidian. Move bulky raw dumps to ~/PKA/data/, add that path to index.py scan-roots → still retrievable by Lucienne, never seen by mobile Obsidian. Best of both.

Decision rule for file location: not “raw vs curated” — it's “do you open it in Obsidian?” You said: almost never. So Obsidian shrinks to a capture surface; everything else is chosen for sync + index convenience.

3Target topology

“SecondBrain” (the brain) = vault.db index + Brain pages — concept, not a folder

Obsidian vault

~/PKA/SecondBrain/

HUMAN SURFACE — small

SB Inbox/ (capture)
Brain/ (my memory)
meetings/ (small notes)

~/PKA/data/ raw store

indexed, NOT in Obsidian

LLM-ONLY — bulky

sources-md/ (folder dumps)
.email-attachments/ (dead)
strategy Excel→MD, saved historic docs

M365 / Dropbox / X3

live, queried on demand

COLD SOURCE

email (live M365 search)
Dropbox reporting files
X3 GL

vault.db indexes the durable side — index.py scan-roots: Brain, meetings, data/, report stubs

Reports (HTML) = YOUR consumption surface — served via dashboard :8787 (e.g. fuel-impact report)

4Terminology — the two wikis

“Wiki” for both was confusing. Locked names:

Name	Path	About	Maintainer
Brain	`SecondBrain/wiki/` (path unchanged — verbal name only)	You — projects, people, entities, KYC, FlySafair	compiler
System Wiki	`~/PKA/wiki/`	The system — infra, agents, automations, how things work	Atlas

From here on: “Brain page” = a page about your world. “System Wiki” = how the machinery works.

5Brain pages — open flavor set

Flavor = a page template (which sections appear). Not a fixed list of two — it's an open taxonomy; add flavors as new shapes appear. Starting set:

analysis

e.g. fuel-impact

Data-driven, lives on fresh numbers. Current state · key drivers (priced/dated) · recent reports · open questions.

case

e.g. ncc-overbooking

Legal/dispute narrative. Status · timeline · our position · next deadlines · correspondence.

transaction

e.g. elysium-pssa

Deal/agreement. Parties · key terms · documents · decisions pending · status · what changed.

project

e.g. werda-restructure

Multi-step initiative. Goal · phases · milestones · blockers · next actions.

personal

e.g. medical, estate

Life admin. Key facts · status · documents · review dates.

entity / person

e.g. Werda Inv, a contact

Reference profile. Identity · KYC · relationships · holdings · key dates. (already live)

non-negotiable

Every Brain page carries source back-links. Each fact/section links to the original document it came from — MD path in data/ + original in cloud (Dropbox/OneDrive) + email message-id where relevant. So you (or Lucienne) can always open the source behind any claim. compile.py already stores source paths in frontmatter; we surface them inline as [Source: …] too.

Hierarchy — main project + sub-transactions

The PSSA question: Elysium is the umbrella; the PSSA agreement is transaction-sized on its own. Keep pages flat, express hierarchy via metadata — no nested folders.

# flat files, parent link in registry + wikilink backref
projects/elysium.md          # umbrella — lists sub-items
projects/elysium-pssa.md     # own page (transaction flavor), parent: elysium

Registry row gets parent: elysium; the sub-page opens with [[elysium]] backlink; the Elysium page lists its sub-transactions. Flat retrieval, real relationship, no deep trees. Promote a sub-item to its own page whenever it has its own timeline/decisions (PSSA qualifies).

6Project lifecycle — you control, Luci proposes

No silent auto-create. A registry you own gates everything.

# SecondBrain/wiki/projects/_registry.yaml — you edit this
projects:
  fuel-impact:
    status: active        # active | paused | suspended | archived
    flavor: analysis
    seeds: [B4i emails, IATA fuel, X3 GL]
    last_scanned: 2026-05-20
  ncc-overbooking:
    status: active
    flavor: case
    seeds: [legal emails, exco, NCC docs]
    last_scanned: 2026-05-18

You add / pause / suspend / archive rows (or tell Lucienne to).
Compiler only touches status: active.
Luci proposer task (scheduled): scans recent emails/meetings for recurring entities not yet in the registry → drops candidates in Elmar Inbox/project-candidates.md for your morning triage. Proposes, never creates.

Two-step flow

Create — full historic search across the seeds you name → Lucienne builds the first Brain page.
Update — scheduled compiler refresh. Email scan window = last_scanned − 1 day (T-1 overlap catches stragglers), not a full rescan. sources_hash idempotency skips unchanged pages.

This is what the fuel-impact page already does with B4i emails — we're formalizing it and adding the case flavor.

7email.db — drop the standing pipeline

finding

The compiler's fetch_email_history() is behind an opt-in flag (--with-email, default OFF) and pulls only thread summaries (subject/sender/date/summary) by project keyword — never bodies, never attachments. email.db's entire job is this thin optional feed.

Decision: retire the standing 19 MB email.db sync. Query M365 live at compile time for the project's seed keywords instead. Same result, one less moving part, no DB to sync, and the “history doesn't go all the way back” problem disappears — live search reaches as far back as M365 holds.

8History policy — save once, update incrementally

“If we pull historic info every time for a project it's unnecessary. I want saved docs so you update status with the latest changes, and I can see what changed and the current status.”

Correct — re-pulling history on every refresh is waste. The model is save once, then incremental:

On project create: gather all relevant historic docs once. Original → your cloud folder (Dropbox/OneDrive). MD copy → ~/PKA/data/ (indexed). The Brain page is built from this full set.
On update: only fetch what's new since last_scanned (T-1 overlap). Append new events/decisions, recompute current status, record what changed. Never re-pull the whole history.

So the saved docs are the durable spine; updates are deltas. The page always shows current status + what changed, exactly like you want for the PSSA agreements.

correction to v4 framing

“Bridge history at query time” applies only to ad-hoc deep questions on things never saved (one-off “what did X say in 2019?”). For an active project, history is saved into the data store at create time and never re-fetched. Two different paths — don't conflate.

Worked example — PSSA (Elysium)

New agreements to review. On create: save each version (original → cloud, MD → data/elysium/pssa/, indexed). Brain page elysium-pssa.md (transaction flavor) tracks parties, key terms, documents, decisions-pending, status. Each new agreement version → update appends “what changed” + refreshes decisions-pending. You open the page (or ask me) → see current status + change history, every doc back-linked.

9Reports — close the gap

gap found

~/PKA/reports/ and Luci's ~/workspace/reports/ + ~/workspace/docs/ are not indexed. When you say “find that report we made,” Lucienne currently can't. This is your main consumption surface — and it's invisible to the brain.

Pattern: HTML artifact + MD stub

Layer	Path	Role
HTML artifact	`~/PKA/reports/<date>-<slug>.html`	You read it (via dashboard :8787)
MD stub	`Brain/reports/<date>-<slug>.md`	Indexed — frontmatter title/summary/date + link. Lucienne finds it later.

Cross-machine: Luci's reports + research docs sync into the same indexed tree so both machines' output lands in one searchable brain. This very document is the first instance of the pattern.

10What moves, what stays

Thing	Bulky?	You open it?	Verdict
email-attachments	Very	Never	Moved out (done)
sources-sync dumps	Yes	Never	Move → data/sources-md/
Brain pages	No	Rarely	Stay (small, indexed)
meetings/	No	Rarely	Stay (small, compiler source)
SB Inbox/ (capture)	Tiny	Yes — capture surface	Stay; weekly sweep into Brain
reports (HTML)	—	Yes — main surface	Stay + add indexed MD stub
email.db	19MB DB	Never	Retire → live M365 query

The stall was caused only by the multi-MB dumps. Small text (Brain, meetings, capture) never stalls — don't over-exile it.

11Migration sequence

done Hide .email-attachments/ (53 MB) from Obsidian.
Re-target sources-sync → write to ~/PKA/data/sources-md/ (stop new contamination).
Add ~/PKA/data/**/*.md to index.py scan-roots (keep moved files retrievable).
Move existing SecondBrain/sources/ (minus meetings) → ~/PKA/data/sources-md/; reindex; verify retrieval.
Create SecondBrain/SB Inbox/; set up weekly Lucienne sweep → file durable items into Brain.
Stand up _registry.yaml (with parent: for sub-projects) + Luci project-proposer task → Elmar Inbox/project-candidates.md.
Add case / transaction / project / personal flavors to compile.py; enforce inline source back-links; switch email feed to live M365 query; retire email.db.
Build report-stub indexing: Brain/reports/*.md + rsync Luci reports/docs → data/luci-reports/ (indexed).
Close the 4 wiki-compiler gaps (cascade, lint, log discipline, conflict annotation).

12Decisions made (2026-05-21)

resolved Brain path stays SecondBrain/wiki/ — verbal name “Brain” only, no disk rename (it lives inside SecondBrain, renaming adds no value).
resolved SB Inbox = the Obsidian capture folder, SecondBrain/SB Inbox/. Disambiguates from Elmar Inbox/ and Team Inbox/. You add one note per item (API key, idea, etc.); Lucienne organises and files into the right Brain location.
resolved Notes you create never leave SecondBrain. Lucienne may copy to data/ for indexing, but the original stays in SB. Lucienne's job = keep SB organised.
resolved SB Inbox sweep = weekly. Lucienne reads SB Inbox, files durable items into Brain pages, leaves nothing lost.
resolved Google Keep already imported into SecondBrain previously. Capture friction was the blocker — the SB Inbox one-note-per-item flow should fix that. No re-import needed.

13Luci reports — already mostly solved

corrected

reports/ is NOT gitignored. 61 reports are tracked in git on both Mac and Luci. Luci's ~/workspace/PKA is a clone that auto-syncs to the Mac (every ~30 min, last 07:10 today). So anything Luci writes into ~/workspace/PKA/reports/ already lands on the Mac via git. (v4 proposed ignoring reports — never applied. Disregard.)

Real gap: Luci writes research to ~/workspace/reports/ and ~/workspace/docs/ — outside the PKA clone — so those don't sync. That's where the orchestrator-flow diagram lives.

Fix (two small steps)

Redirect Luci output → write reports/research into ~/workspace/PKA/reports/ (or rsync ~/workspace/reports ~/workspace/docs → ~/workspace/PKA/reports/). Existing git auto-sync carries PKA → Mac. No Tailscale rsync, no Syncthing needed.
Index it — reports/ is currently NOT a vault.db scan-root, so reports aren't searchable even when synced. Add reports/**/*.md (the MD stubs) to index.py.

Result: Luci report → synced to Mac via existing git → indexed → Lucienne finds it. HTML stays the artifact; the MD stub is what gets indexed (HTML is large/non-text for FTS).

PKA · SecondBrain Architecture v5 · 2026-05-21 · Lucienne
Source spec: ~/.claude/plans/optimized-conjuring-church.md (v4) · this doc supersedes the SecondBrain-exit + ingestion sections.