“SecondBrain” is not a folder — it is the vault.db index plus the compiled Brain pages. Obsidian is just one window onto a small, human-friendly corner of it. The index spans the Obsidian vault and raw stores outside it.
Mobile Obsidian stalled on indexing. Root cause confirmed: the email pipeline + sources-sync wrote multi-megabyte Markdown dumps (Excel→MD via markitdown, PDF→MD via gemini-vision) into SecondBrain/sources/. Six finance attachments alone were 5.85–7.91 MB each; the parser chokes on them.
Those email-attachment dumps were already excluded from vault.db (index.py lines 115–119). Not indexed → invisible to the compiler → they served zero purpose. 53 MB of pure dead weight. Moving them out (done 2026-05-20, dot-prefixed to .email-attachments/) lost nothing.
Two indexes were being conflated:
| System | Indexes what | Drives |
|---|---|---|
| Obsidian | Every file in the vault folder (can't selectively exclude beyond dotfiles) | Mobile stall — and your browsing, which you said you barely do |
| vault.db | Whatever index.py scan-roots point at — independent of Obsidian | What Lucienne can retrieve when you ask |
So a file can be indexed for the LLM without living in Obsidian. Move bulky raw dumps to ~/PKA/data/, add that path to index.py scan-roots → still retrievable by Lucienne, never seen by mobile Obsidian. Best of both.
Decision rule for file location: not “raw vs curated” — it's “do you open it in Obsidian?” You said: almost never. So Obsidian shrinks to a capture surface; everything else is chosen for sync + index convenience.
vault.db index + Brain pages — concept, not a folder“Wiki” for both was confusing. Locked names:
| Name | Path | About | Maintainer |
|---|---|---|---|
| Brain | SecondBrain/wiki/ (path unchanged — verbal name only) | You — projects, people, entities, KYC, FlySafair | compiler |
| System Wiki | ~/PKA/wiki/ | The system — infra, agents, automations, how things work | Atlas |
From here on: “Brain page” = a page about your world. “System Wiki” = how the machinery works.
Flavor = a page template (which sections appear). Not a fixed list of two — it's an open taxonomy; add flavors as new shapes appear. Starting set:
e.g. fuel-impact
Data-driven, lives on fresh numbers. Current state · key drivers (priced/dated) · recent reports · open questions.
e.g. ncc-overbooking
Legal/dispute narrative. Status · timeline · our position · next deadlines · correspondence.
e.g. elysium-pssa
Deal/agreement. Parties · key terms · documents · decisions pending · status · what changed.
e.g. werda-restructure
Multi-step initiative. Goal · phases · milestones · blockers · next actions.
e.g. medical, estate
Life admin. Key facts · status · documents · review dates.
e.g. Werda Inv, a contact
Reference profile. Identity · KYC · relationships · holdings · key dates. (already live)
Every Brain page carries source back-links. Each fact/section links to the original document it came from — MD path in data/ + original in cloud (Dropbox/OneDrive) + email message-id where relevant. So you (or Lucienne) can always open the source behind any claim. compile.py already stores source paths in frontmatter; we surface them inline as [Source: …] too.
The PSSA question: Elysium is the umbrella; the PSSA agreement is transaction-sized on its own. Keep pages flat, express hierarchy via metadata — no nested folders.
# flat files, parent link in registry + wikilink backref projects/elysium.md # umbrella — lists sub-items projects/elysium-pssa.md # own page (transaction flavor), parent: elysium
Registry row gets parent: elysium; the sub-page opens with [[elysium]] backlink; the Elysium page lists its sub-transactions. Flat retrieval, real relationship, no deep trees. Promote a sub-item to its own page whenever it has its own timeline/decisions (PSSA qualifies).
No silent auto-create. A registry you own gates everything.
# SecondBrain/wiki/projects/_registry.yaml — you edit this projects: fuel-impact: status: active # active | paused | suspended | archived flavor: analysis seeds: [B4i emails, IATA fuel, X3 GL] last_scanned: 2026-05-20 ncc-overbooking: status: active flavor: case seeds: [legal emails, exco, NCC docs] last_scanned: 2026-05-18
status: active.Elmar Inbox/project-candidates.md for your morning triage. Proposes, never creates.last_scanned − 1 day (T-1 overlap catches stragglers), not a full rescan. sources_hash idempotency skips unchanged pages.This is what the fuel-impact page already does with B4i emails — we're formalizing it and adding the case flavor.
The compiler's fetch_email_history() is behind an opt-in flag (--with-email, default OFF) and pulls only thread summaries (subject/sender/date/summary) by project keyword — never bodies, never attachments. email.db's entire job is this thin optional feed.
Decision: retire the standing 19 MB email.db sync. Query M365 live at compile time for the project's seed keywords instead. Same result, one less moving part, no DB to sync, and the “history doesn't go all the way back” problem disappears — live search reaches as far back as M365 holds.
“If we pull historic info every time for a project it's unnecessary. I want saved docs so you update status with the latest changes, and I can see what changed and the current status.”
Correct — re-pulling history on every refresh is waste. The model is save once, then incremental:
~/PKA/data/ (indexed). The Brain page is built from this full set.last_scanned (T-1 overlap). Append new events/decisions, recompute current status, record what changed. Never re-pull the whole history.So the saved docs are the durable spine; updates are deltas. The page always shows current status + what changed, exactly like you want for the PSSA agreements.
“Bridge history at query time” applies only to ad-hoc deep questions on things never saved (one-off “what did X say in 2019?”). For an active project, history is saved into the data store at create time and never re-fetched. Two different paths — don't conflate.
New agreements to review. On create: save each version (original → cloud, MD → data/elysium/pssa/, indexed). Brain page elysium-pssa.md (transaction flavor) tracks parties, key terms, documents, decisions-pending, status. Each new agreement version → update appends “what changed” + refreshes decisions-pending. You open the page (or ask me) → see current status + change history, every doc back-linked.
~/PKA/reports/ and Luci's ~/workspace/reports/ + ~/workspace/docs/ are not indexed. When you say “find that report we made,” Lucienne currently can't. This is your main consumption surface — and it's invisible to the brain.
| Layer | Path | Role |
|---|---|---|
| HTML artifact | ~/PKA/reports/<date>-<slug>.html | You read it (via dashboard :8787) |
| MD stub | Brain/reports/<date>-<slug>.md | Indexed — frontmatter title/summary/date + link. Lucienne finds it later. |
Cross-machine: Luci's reports + research docs sync into the same indexed tree so both machines' output lands in one searchable brain. This very document is the first instance of the pattern.
| Thing | Bulky? | You open it? | Verdict |
|---|---|---|---|
| email-attachments | Very | Never | Moved out (done) |
| sources-sync dumps | Yes | Never | Move → data/sources-md/ |
| Brain pages | No | Rarely | Stay (small, indexed) |
| meetings/ | No | Rarely | Stay (small, compiler source) |
| SB Inbox/ (capture) | Tiny | Yes — capture surface | Stay; weekly sweep into Brain |
| reports (HTML) | — | Yes — main surface | Stay + add indexed MD stub |
| email.db | 19MB DB | Never | Retire → live M365 query |
The stall was caused only by the multi-MB dumps. Small text (Brain, meetings, capture) never stalls — don't over-exile it.
.email-attachments/ (53 MB) from Obsidian.sources-sync → write to ~/PKA/data/sources-md/ (stop new contamination).~/PKA/data/**/*.md to index.py scan-roots (keep moved files retrievable).SecondBrain/sources/ (minus meetings) → ~/PKA/data/sources-md/; reindex; verify retrieval.SecondBrain/SB Inbox/; set up weekly Lucienne sweep → file durable items into Brain._registry.yaml (with parent: for sub-projects) + Luci project-proposer task → Elmar Inbox/project-candidates.md.compile.py; enforce inline source back-links; switch email feed to live M365 query; retire email.db.Brain/reports/*.md + rsync Luci reports/docs → data/luci-reports/ (indexed).SecondBrain/wiki/ — verbal name “Brain” only, no disk rename (it lives inside SecondBrain, renaming adds no value).SecondBrain/SB Inbox/. Disambiguates from Elmar Inbox/ and Team Inbox/. You add one note per item (API key, idea, etc.); Lucienne organises and files into the right Brain location.data/ for indexing, but the original stays in SB. Lucienne's job = keep SB organised.reports/ is NOT gitignored. 61 reports are tracked in git on both Mac and Luci. Luci's ~/workspace/PKA is a clone that auto-syncs to the Mac (every ~30 min, last 07:10 today). So anything Luci writes into ~/workspace/PKA/reports/ already lands on the Mac via git. (v4 proposed ignoring reports — never applied. Disregard.)
Real gap: Luci writes research to ~/workspace/reports/ and ~/workspace/docs/ — outside the PKA clone — so those don't sync. That's where the orchestrator-flow diagram lives.
~/workspace/PKA/reports/ (or rsync ~/workspace/reports ~/workspace/docs → ~/workspace/PKA/reports/). Existing git auto-sync carries PKA → Mac. No Tailscale rsync, no Syncthing needed.reports/ is currently NOT a vault.db scan-root, so reports aren't searchable even when synced. Add reports/**/*.md (the MD stubs) to index.py.Result: Luci report → synced to Mac via existing git → indexed → Lucienne finds it. HTML stays the artifact; the MD stub is what gets indexed (HTML is large/non-text for FTS).