← Reports
PKA · Design Spec v5 · DRAFT

SecondBrain Architecture

Splitting the human capture surface from the LLM memory index — open flavor set, source back-links, save-once history, and closing the reports gap.
2026-05-21 · author: Lucienne · status: for Elmar review · supersedes optimized-conjuring-church.md (v4)

0The one-sentence model

“SecondBrain” is not a folder — it is the vault.db index plus the compiled Brain pages. Obsidian is just one window onto a small, human-friendly corner of it. The index spans the Obsidian vault and raw stores outside it.

1What went wrong

Mobile Obsidian stalled on indexing. Root cause confirmed: the email pipeline + sources-sync wrote multi-megabyte Markdown dumps (Excel→MD via markitdown, PDF→MD via gemini-vision) into SecondBrain/sources/. Six finance attachments alone were 5.85–7.91 MB each; the parser chokes on them.

key finding

Those email-attachment dumps were already excluded from vault.db (index.py lines 115–119). Not indexed → invisible to the compiler → they served zero purpose. 53 MB of pure dead weight. Moving them out (done 2026-05-20, dot-prefixed to .email-attachments/) lost nothing.

2The decoupling insight

Two indexes were being conflated:

SystemIndexes whatDrives
ObsidianEvery file in the vault folder (can't selectively exclude beyond dotfiles)Mobile stall — and your browsing, which you said you barely do
vault.dbWhatever index.py scan-roots point at — independent of ObsidianWhat Lucienne can retrieve when you ask

So a file can be indexed for the LLM without living in Obsidian. Move bulky raw dumps to ~/PKA/data/, add that path to index.py scan-roots → still retrievable by Lucienne, never seen by mobile Obsidian. Best of both.

Decision rule for file location: not “raw vs curated” — it's “do you open it in Obsidian?” You said: almost never. So Obsidian shrinks to a capture surface; everything else is chosen for sync + index convenience.

3Target topology

“SecondBrain” (the brain) = vault.db index + Brain pages — concept, not a folder
Obsidian vault
~/PKA/SecondBrain/
HUMAN SURFACE — small
  • SB Inbox/ (capture)
  • Brain/ (my memory)
  • meetings/ (small notes)
~/PKA/data/ raw store
indexed, NOT in Obsidian
LLM-ONLY — bulky
  • sources-md/ (folder dumps)
  • .email-attachments/ (dead)
  • strategy Excel→MD, saved historic docs
M365 / Dropbox / X3
live, queried on demand
COLD SOURCE
  • email (live M365 search)
  • Dropbox reporting files
  • X3 GL
vault.db indexes the durable side — index.py scan-roots: Brain, meetings, data/, report stubs
Reports (HTML) = YOUR consumption surface — served via dashboard :8787 (e.g. fuel-impact report)

4Terminology — the two wikis

“Wiki” for both was confusing. Locked names:

NamePathAboutMaintainer
BrainSecondBrain/wiki/ (path unchanged — verbal name only)You — projects, people, entities, KYC, FlySafaircompiler
System Wiki~/PKA/wiki/The system — infra, agents, automations, how things workAtlas

From here on: “Brain page” = a page about your world. “System Wiki” = how the machinery works.

5Brain pages — open flavor set

Flavor = a page template (which sections appear). Not a fixed list of two — it's an open taxonomy; add flavors as new shapes appear. Starting set:

analysis

e.g. fuel-impact

Data-driven, lives on fresh numbers. Current state · key drivers (priced/dated) · recent reports · open questions.

case

e.g. ncc-overbooking

Legal/dispute narrative. Status · timeline · our position · next deadlines · correspondence.

transaction

e.g. elysium-pssa

Deal/agreement. Parties · key terms · documents · decisions pending · status · what changed.

project

e.g. werda-restructure

Multi-step initiative. Goal · phases · milestones · blockers · next actions.

personal

e.g. medical, estate

Life admin. Key facts · status · documents · review dates.

entity / person

e.g. Werda Inv, a contact

Reference profile. Identity · KYC · relationships · holdings · key dates. (already live)

non-negotiable

Every Brain page carries source back-links. Each fact/section links to the original document it came from — MD path in data/ + original in cloud (Dropbox/OneDrive) + email message-id where relevant. So you (or Lucienne) can always open the source behind any claim. compile.py already stores source paths in frontmatter; we surface them inline as [Source: …] too.

Hierarchy — main project + sub-transactions

The PSSA question: Elysium is the umbrella; the PSSA agreement is transaction-sized on its own. Keep pages flat, express hierarchy via metadata — no nested folders.

# flat files, parent link in registry + wikilink backref
projects/elysium.md          # umbrella — lists sub-items
projects/elysium-pssa.md     # own page (transaction flavor), parent: elysium

Registry row gets parent: elysium; the sub-page opens with [[elysium]] backlink; the Elysium page lists its sub-transactions. Flat retrieval, real relationship, no deep trees. Promote a sub-item to its own page whenever it has its own timeline/decisions (PSSA qualifies).

6Project lifecycle — you control, Luci proposes

No silent auto-create. A registry you own gates everything.

# SecondBrain/wiki/projects/_registry.yaml — you edit this
projects:
  fuel-impact:
    status: active        # active | paused | suspended | archived
    flavor: analysis
    seeds: [B4i emails, IATA fuel, X3 GL]
    last_scanned: 2026-05-20
  ncc-overbooking:
    status: active
    flavor: case
    seeds: [legal emails, exco, NCC docs]
    last_scanned: 2026-05-18

Two-step flow

  1. Create — full historic search across the seeds you name → Lucienne builds the first Brain page.
  2. Update — scheduled compiler refresh. Email scan window = last_scanned − 1 day (T-1 overlap catches stragglers), not a full rescan. sources_hash idempotency skips unchanged pages.

This is what the fuel-impact page already does with B4i emails — we're formalizing it and adding the case flavor.

7email.db — drop the standing pipeline

finding

The compiler's fetch_email_history() is behind an opt-in flag (--with-email, default OFF) and pulls only thread summaries (subject/sender/date/summary) by project keyword — never bodies, never attachments. email.db's entire job is this thin optional feed.

Decision: retire the standing 19 MB email.db sync. Query M365 live at compile time for the project's seed keywords instead. Same result, one less moving part, no DB to sync, and the “history doesn't go all the way back” problem disappears — live search reaches as far back as M365 holds.

8History policy — save once, update incrementally

“If we pull historic info every time for a project it's unnecessary. I want saved docs so you update status with the latest changes, and I can see what changed and the current status.”

Correct — re-pulling history on every refresh is waste. The model is save once, then incremental:

  1. On project create: gather all relevant historic docs once. Original → your cloud folder (Dropbox/OneDrive). MD copy → ~/PKA/data/ (indexed). The Brain page is built from this full set.
  2. On update: only fetch what's new since last_scanned (T-1 overlap). Append new events/decisions, recompute current status, record what changed. Never re-pull the whole history.

So the saved docs are the durable spine; updates are deltas. The page always shows current status + what changed, exactly like you want for the PSSA agreements.

correction to v4 framing

“Bridge history at query time” applies only to ad-hoc deep questions on things never saved (one-off “what did X say in 2019?”). For an active project, history is saved into the data store at create time and never re-fetched. Two different paths — don't conflate.

Worked example — PSSA (Elysium)

New agreements to review. On create: save each version (original → cloud, MD → data/elysium/pssa/, indexed). Brain page elysium-pssa.md (transaction flavor) tracks parties, key terms, documents, decisions-pending, status. Each new agreement version → update appends “what changed” + refreshes decisions-pending. You open the page (or ask me) → see current status + change history, every doc back-linked.

9Reports — close the gap

gap found

~/PKA/reports/ and Luci's ~/workspace/reports/ + ~/workspace/docs/ are not indexed. When you say “find that report we made,” Lucienne currently can't. This is your main consumption surface — and it's invisible to the brain.

Pattern: HTML artifact + MD stub

LayerPathRole
HTML artifact~/PKA/reports/<date>-<slug>.htmlYou read it (via dashboard :8787)
MD stubBrain/reports/<date>-<slug>.mdIndexed — frontmatter title/summary/date + link. Lucienne finds it later.

Cross-machine: Luci's reports + research docs sync into the same indexed tree so both machines' output lands in one searchable brain. This very document is the first instance of the pattern.

10What moves, what stays

ThingBulky?You open it?Verdict
email-attachmentsVeryNeverMoved out (done)
sources-sync dumpsYesNeverMove → data/sources-md/
Brain pagesNoRarelyStay (small, indexed)
meetings/NoRarelyStay (small, compiler source)
SB Inbox/ (capture)TinyYes — capture surfaceStay; weekly sweep into Brain
reports (HTML)Yes — main surfaceStay + add indexed MD stub
email.db19MB DBNeverRetire → live M365 query

The stall was caused only by the multi-MB dumps. Small text (Brain, meetings, capture) never stalls — don't over-exile it.

11Migration sequence

  1. done Hide .email-attachments/ (53 MB) from Obsidian.
  2. Re-target sources-sync → write to ~/PKA/data/sources-md/ (stop new contamination).
  3. Add ~/PKA/data/**/*.md to index.py scan-roots (keep moved files retrievable).
  4. Move existing SecondBrain/sources/ (minus meetings) → ~/PKA/data/sources-md/; reindex; verify retrieval.
  5. Create SecondBrain/SB Inbox/; set up weekly Lucienne sweep → file durable items into Brain.
  6. Stand up _registry.yaml (with parent: for sub-projects) + Luci project-proposer task → Elmar Inbox/project-candidates.md.
  7. Add case / transaction / project / personal flavors to compile.py; enforce inline source back-links; switch email feed to live M365 query; retire email.db.
  8. Build report-stub indexing: Brain/reports/*.md + rsync Luci reports/docs → data/luci-reports/ (indexed).
  9. Close the 4 wiki-compiler gaps (cascade, lint, log discipline, conflict annotation).

12Decisions made (2026-05-21)

13Luci reports — already mostly solved

corrected

reports/ is NOT gitignored. 61 reports are tracked in git on both Mac and Luci. Luci's ~/workspace/PKA is a clone that auto-syncs to the Mac (every ~30 min, last 07:10 today). So anything Luci writes into ~/workspace/PKA/reports/ already lands on the Mac via git. (v4 proposed ignoring reports — never applied. Disregard.)

Real gap: Luci writes research to ~/workspace/reports/ and ~/workspace/docs/outside the PKA clone — so those don't sync. That's where the orchestrator-flow diagram lives.

Fix (two small steps)

  1. Redirect Luci output → write reports/research into ~/workspace/PKA/reports/ (or rsync ~/workspace/reports ~/workspace/docs~/workspace/PKA/reports/). Existing git auto-sync carries PKA → Mac. No Tailscale rsync, no Syncthing needed.
  2. Index itreports/ is currently NOT a vault.db scan-root, so reports aren't searchable even when synced. Add reports/**/*.md (the MD stubs) to index.py.

Result: Luci report → synced to Mac via existing git → indexed → Lucienne finds it. HTML stays the artifact; the MD stub is what gets indexed (HTML is large/non-text for FTS).