← Reports

Vault & SecondBrain

An interactive explainer — three stores, one index, a compiled wiki. Updated 2026-05-24 (post-v5). Tap the cards, rungs, and buttons below — they react.

The one-sentence version.

Everything Elmar captures and everything Lucienne needs to remember lives as markdown files in the PKA repo, gets indexed into one searchable graph (vault.db), and the high-value bits are compiled by AI into a browsable wiki. "Vault" is how Lucienne behaves and remembers; "SecondBrain" is Elmar's projects and life.

0
files indexed in vault.db
0
source markdown in data/
0
compiled wiki pages
0
scan roots, one DB

1. Three stores — tap to explore

"Vault" is overloaded. There are three distinct stores plus a derived cache. Tap a card to see what's inside and what it's for:

🔐 Personal Vault
~/.claude/vault/
📒 PKA Vault
~/PKA/Vault/
🧠 SecondBrain
~/PKA/SecondBrain/
🗂️ data/ cache
~/PKA/data/
Rule of thumb: tells Lucienne how to behave/remember → Vault. Tells Elmar what's happening in his projects + life → SecondBrain.

2. The v5 change — sources left Obsidian

Until May 2026 thousands of converted source markdowns lived inside the Obsidian vault. That bloated it (~160 MB) and stalled mobile Obsidian on indexing.

before

5,000+ derived markdowns in SecondBrain/sources/ — bloating the vault, syncing to the phone, mixing machine cache with Elmar's own notes.

after (v5)

Sources moved to ~/PKA/data/sources/out of Obsidian, gitignored, still fully indexed via the data scan root. Obsidian dropped 160 MB → ~4 MB.

SecondBrain/sources/ no longer exists. Paths resolve through pka_paths.resolve_source_path(). The vault is now just the brain (wiki + notes); data/ is the cache.

3. The three layers — how knowledge is built

SecondBrain is a Karpathy-style LLM wiki: capture once, index into a graph, compile into synthesized pages.

Layer 1 · Raw sources Meeting notes (211) ideas · inbox · scratchpad data/sources/ (5,247 MD) exco · emails · family-docs ground truth — AI never rewrites Layer 2 · vault.db SQLite · FTS5 + graph 4,167 files indexed path · frontmatter · edges 6 relation types "everything connected to X" — ms Layer 3 · The wiki 106 compiled pages projects · people · concepts priorities · board-brief · fleet synthesized · every claim cited one page, not 30 files index.py compile.py

The hard work happens at ingest time — so queries are fast, because the synthesis already exists. That's the difference from plain RAG, which re-discovers everything on every query.

4. The indexer — one DB, four roots

index.py (schema v4) scans four roots into the single vault.db. Markdown is source of truth; the DB rebuilds in seconds.

RootPathWhat's there
pkaPKA/Memory, tasks, notes, projects, CLAUDE.md, team briefs, and reports/ (commissioned analysis).
personal~/.claude/vault/Identity, contacts, entities, preferences, work, infrastructure (secrets excluded).
secondbrainSecondBrain/Meetings, ideas, docs, recipes, and the compiled wiki/.
data v5~/PKA/data/Out-of-vault source cache. Keeps moved sources indexed without bloating Obsidian.
Reports are indexed too (2026-05-24): HTML stripped to text — CSS/JS dropped, SVG diagram labels kept — so a commissioned report stays retrievable by search, not just viewable.
How a file is processed (4 passes)
hash & skip (unchanged files skipped) → frontmatter parse (YAML) → edges (typed frontmatter fields + body wikilinks, code blocks stripped so bash [[ ]] isn't mistaken for a link) → FTS + tags. Rebuild from scratch any time: python index.py --rebuild.

5. How a question gets answered — tap a rung

CLAUDE.md Rule 15. Start at the top, stop when you have the answer. Tap each rung for when to use it.

GBrain (the old semantic layer) was retired 2026-05-01 — a 100%-reliable wiki+FTS combo beat its lock-prone synthesis.

6. Capture → compile

You only ever do two things: record meetings and handle email normally. Everything downstream is automatic.

1. CAPTURE  meeting → Drive → Luci transcribes (Gemini) → meeting-notes skill writes a note
            or email / Exco / PDF / DOCX → convert_to_md.py (markitdown, Gemini-vision fallback)
2. markdown lands in data/sources/<type>/ (or SecondBrain notes for meetings)
3. index.py rebuilds vault.db → searchable
4. dream cycle (nightly, Luci) detects stale wiki pages
5. dedicated compilers refresh curated pages
6. next question answered from the page in seconds, with citations
Ingest channels & schedules
ChannelWhere / whenLands in
Exco syncMac LaunchAgent, daily 10:00 SASTdata/sources/exco/
Sources-sync (watched folders)Mac LaunchAgent, daily 10:15 SASTdata/sources/<watch>/
Email syncLuci, daily 7am SASTemail.db + OneDrive
"ingest this" / Team Inboxon demandconverted, indexed
Meeting recordedovernightSecondBrain note → index.py

7. Your playbook — pick what you want to do

You almost never pick a folder. Tap what you're trying to do — see where it goes and whether it stays searchable:

The two inboxes — don't mix them up

📥 Team Inbox/ you → team

Drop files for the team to process. Lucienne triages whatever lands here.

📤 Elmar Inbox/ team → you

Finished work for you to read — reports, research, handoffs. Don't dump scratch here.

Read-only — generated, don't hand-edit

Don't touchWhyEdit instead
data/sources/**Machine-converted copies, rebuilt from the originalsthe original in Dropbox / GDrive
vault.db · graphify-out/Generated indexes — rebuilt from markdownnothing — they regenerate
A wiki page above its MANUAL OVERRIDES lineAuto-compiled, overwritten on recompileonly below the marker
_archived/ · _deleted_/Historical snapshotsnothing — leave as history
Everything else under SecondBrain/ideas/, inbox/, Scratchpad/, docs/, your notes — is yours to write freely.

8. The compiler & project registry

📄 Dedicated compilers

compile.py + compile_priorities.py (Tue 12:00) + compile_board_brief.py (Tue 12:30), Mac-side. Each writes a synthesized page with a sources_hash for staleness detection.

🗒️ Project registry v5

wiki/projects/_registry.yaml — Elmar-owned. status · flavor · seeds · parent. Seeds run as FTS queries so a project finds its sources even when filenames don't contain the project name.

9. Across machines — what syncs, what doesn't

GitHub · conrelma/PKA markdown source of truth 🖥️ Lucienne — Mac ✓ SYNCED: Vault/memory · wiki · notes ✗ LOCAL: vault.db · data/sources/ (gitignored) data/ rebuilt from cloud originals vault.db rebuilt from synced markdown auto-commit + push every ~30 min ☁️ Luci — Hetzner ✓ SYNCED: same markdown via git-sync (hourly) ✗ LOCAL: own vault.db, rebuilt nightly dream cycle 2am: pull → rebuild → staleness curated compilers; broad recompile on demand data-reading compiles co-located on Mac push / pull pull / push
Git moves only the markdown. Each machine keeps its own vault.db + data/ cache (both gitignored, rebuilt locally). Search is local and fast; the truth is the shared markdown.

10. Related & outstanding

Generated 2026-05-24 · interactive explainer · PKA vault + SecondBrain · counts from live vault.db · all interactions are vanilla JS, no external dependencies