← Reports

Email Archival Redesign: .eml + mu

Context

Current email archival has a critical gap: we don't save the original email. We store: - Truncated body text (8KB) in email.db - Attachments separately on OneDrive - MD envelope metadata

The full original email lives only on Microsoft's servers. If access breaks, everything > 8KB is gone.

Elmar wants something like MailStore — a sealed container per email with full search and attachment retrieval. The Unix-standard approach is .eml files (RFC 5322 MIME format) stored in Maildir, indexed by mu.

Proposed Solution

Phase 1 (Immediate - Day 1): Add .eml download to existing sync pipeline Phase 2 (Day 2): Install mu, set up Maildir, enable indexing Phase 3 (Day 3-4): Build simple web UI for search Phase 4 (Later): Migrate existing emails, decommission old pipeline

Key Design Decisions

Question Decision Rationale
Where to store .eml? OneDrive via rclone mount (appears as ~/Maildir/) 50GB+ email archive lives in cloud, Luci disk safe
Keep OneDrive attachments? Phase out after .eml verified Attachments inside .eml = no duplicate storage needed
Keep email.db? Yes, for metadata queries Still useful for project/sender lookups, mu for full-text
Web UI now or later? Later (Day 3-4) CLI search works immediately, UI is polish
Graph API for .eml? Yes, $value endpoint GET /me/messages/{id}/$value returns raw MIME
Search: mu vs GBrain? Hybrid mu = field queries (from:date), GBrain = semantic ("emails about fleet")
Bulk export? Mac/Windows Outlook for initial dump Faster than Graph API for 10-year archive, then transfer to Luci

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Outlook M365   │────▶│  Graph API       │────▶│  .eml files     │
│  (source)       │     │  (download)      │     │  (OneDrive)     │
└─────────────────┘     └──────────────────┘     └────────┬────────┘
                                                         │
                                            rclone mount │
                                                         ▼
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Web UI        │────▶│   mu index       │────▶│  ~/Maildir/     │
│  (search)       │     │  (~/.mu database)│     │  (local mount)  │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │   GBrain         │
                       │  (semantic)      │
                       └──────────────────┘

Storage: .eml files live in OneDrive cloud (50GB+ fine) Index: mu database at ~/.mu/ (~5% of email volume = ~2.5GB for 50GB archive) Access: rclone mount streams files on-demand when opened

Phase 1: rclone Mount + .eml Download (Day 1)

1.1 Setup rclone Mount

# Install rclone
curl https://rclone.org/install.sh | sudo bash

# Configure OneDrive (interactive)
rclone config
# Name: onedrive
# Type: onedrive
# Follow browser auth flow

# Create mount point
mkdir -p ~/Maildir

# Test mount
rclone mount onedrive:EmailArchive ~/Maildir --daemon --vfs-cache-mode full

# Add to systemd for persistence
sudo nano /etc/systemd/system/rclone-onedrive.service

systemd service:

[Unit]
Description=rclone OneDrive mount
After=network-online.target

[Service]
Type=notify
ExecStart=/usr/bin/rclone mount onedrive:EmailArchive /home/lucienne/Maildir \
  --config /home/lucienne/.config/rclone/rclone.conf \
  --vfs-cache-mode full \
  --daemon
Restart=on-failure

[Install]
WantedBy=multi-user.target

1.2 Bulk Export Option (One-time for 10-year archive)

On Mac/Windows (faster for initial dump): - Mac: Outlook → Export → .olm file, OR mbsync with IMAP - Windows: Outlook → Export → .pst file, then convert to .eml

Transfer to Luci:

# From Mac to Luci
rsync -av ~/EmailArchive/ lucienne@100.118.207.3:~/Maildir/

# Or upload to OneDrive directly, let rclone sync down

1.3 Files to Modify

1. ~/workspace/scripts/graph_api.py - Add new command: download-message <message-id> --output <path> - Uses Graph API endpoint: GET /me/messages/{id}/$value - Returns raw MIME content (complete .eml with headers + body + attachments)

def api_download_message(message_id: str, output_path: str) -> None:
    """Download raw MIME content (.eml) for a message."""
    url = f"{GRAPH_BASE}/me/messages/{message_id}/$value"
    token = get_token()
    req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
    with urllib.request.urlopen(req) as response:
        Path(output_path).parent.mkdir(parents=True, exist_ok=True)
        with open(output_path, "wb") as f:
            f.write(response.read())

Add argparse entry:

subparsers = parser.add_subparsers(dest='command', help='Graph API commands')
dl_parser = subparsers.add_parser('download-message', help='Download raw .eml')
dl_parser.add_argument('message_id', help='Graph API message ID')
dl_parser.add_argument('--output', required=True, help='Output .eml path')

2. ~/workspace/scripts/email_attachment_sync.py - Add --save-eml flag - After fetching message body, download .eml to Maildir structure - Maildir layout (via rclone mount to OneDrive): ~/Maildir/ # rclone mount point ├── cur/ # Read emails │ └── {graph_id}.eml # Actual files in OneDrive cloud ├── new/ # Unread emails └── tmp/ # During delivery - Filename: {graph_message_id}.eml (unique, reversible) - If email has attachments, they're inside the .eml (MIME-encoded)

3. ~/workspace/data/email.db - Add column: eml_path TEXT — path to .eml file in Maildir - Update INSERT/UPDATE statements to include this path

Verification

# 1. Verify rclone mount
df -h | grep Maildir
ls ~/Maildir/  # Should show cur/, new/, tmp/

# 2. Test download for one message
python3 graph_api.py download-message <msg-id> --output ~/Maildir/cur/test.eml

# 3. Verify it's a valid .eml (has headers + body + attachments)
file ~/Maildir/cur/test.eml
# Should show: RFC 822 mail text or "message/rfc822"

# 4. Check attachments are inside
munpack -t ~/Maildir/cur/test.eml

# 5. Verify file sync'd to OneDrive
# Check OneDrive web interface or:
rclone ls onedrive:EmailArchive/cur/

Phase 2: Install mu + Indexing (Day 2)

Installation

sudo apt install mu maildir-utils munpack
mu init --maildir=~/Maildir --personal-address=elmar@conradie.za
mu index

About mu Database

Scheduled Indexing

Add to ~/workspace/tasks/email-index.task:

schedule: "*/15 * * * *"  # Every 15 minutes
command: "mu index --quiet"

CLI Search (Immediate Value)

# Find emails from Stephan
mu find from:stephan

# Find emails with attachments about "fleet"
mu find fleet flag:attach

# Find emails last 3 months
mu find date:3m..

# Output JSON for web UI
mu find from:elmar --format=json --maxnum=100

Hybrid Search: mu + GBrain

Use case Tool Example
Field queries mu mu find from:stephan date:2024-01..
Attachment search mu mu find flag:attach mime:application/pdf
Semantic search GBrain "emails discussing fleet contracts Q3"
Project context GBrain "what did Stephan send about Heron?"

Workflow: GBrain finds email semantically → returns graph_message_id → mu fetches full .eml

Verification

# Index should build in <2 min for 1000 emails
time mu index

# Search should be <100ms
time mu find from:stephan

# Check database stats
du -sh ~/.mu/  # Should be ~5% of email archive size

Phase 3: Web UI (Day 3-4)

Simple Flask App

New file: ~/workspace/scripts/email_search_server.py

Features: - Search bar at top (mu query syntax) - Results table: date, sender, subject, attachments (yes/no) - Click row → show email body + attachments list - Download attachment button (extracts from .eml on-demand)

Routes: - GET / — search form - GET /api/search?q=<query>mu find --format=json - GET /api/view/<message-id>mu view <msg-id> + parse attachments - GET /api/attach/<message-id>/<part-id> — extract attachment from .eml

Integration with Mission Control

Add email search widget to MC dashboard at /email-search route.

Verification

# Start server
python3 email_search_server.py --port 3020

# Test search
curl "http://localhost:3020/api/search?q=from:stephan"

# Test view
curl "http://localhost:3020/api/view/<msg-id>"

Phase 4: Migration & Cleanup (Later)

Migrate Existing Emails

Decommission (Optional)

Critical Files Reference

File Purpose
/etc/systemd/system/rclone-onedrive.service rclone mount persistence (create new)
~/.config/rclone/rclone.conf OneDrive credentials for rclone
~/workspace/scripts/graph_api.py Graph API client — add download-message command
~/workspace/scripts/email_attachment_sync.py Main sync pipeline — add --save-eml flag
~/workspace/data/email.db Metadata DB — add eml_path column
~/Maildir/ rclone mount point (OneDrive appears local)
~/.mu/ mu's Xapian index database (~5% of archive size)
~/workspace/tasks/email-index.task Scheduled mu indexing (create new)

Graph API Reference

Download raw MIME (.eml):

GET /me/messages/{message-id}/$value
Authorization: Bearer {token}

Response: Raw RFC 5322 message (complete .eml file)

Existing scopes cover this — no new permissions needed: - Mail.ReadWrite includes reading message content

Risks & Mitigations

Risk Mitigation
Storage: 50GB+ emails Lives in OneDrive cloud, not Luci disk. rclone streams on-demand.
mu database size ~5% of archive (2.5GB for 50GB). Totally fine on Luci.
rclone mount fails systemd auto-restart. Email access requires network (expected).
Indexing time mu indexes ~1000 emails/sec. Full 10-year archive <5 min.
Graph API rate limits Download is per-message, same as current attachment fetch.
Network latency VFS cache in rclone (--vfs-cache-mode full) caches accessed files locally.
Search: semantic vs keyword Hybrid: GBrain for semantic, mu for field queries.
mu not installed Simple apt install mu maildir-utils. Well-maintained package.
OneDrive auth expires rclone handles refresh. Token stored in ~/.config/rclone/rclone.conf.

Success Criteria

  1. .eml files downloading and saving to Maildir
  2. mu index completes without errors
  3. mu find returns relevant results for test queries
  4. Web UI shows search results and email bodies
  5. Attachments extractable from .eml on-demand

Timeline Estimate

Phase Time
Phase 1 (.eml download) 2-4 hours
Phase 2 (mu install + index) 1-2 hours
Phase 3 (web UI) 6-10 hours
Phase 4 (migration) 2-4 hours

Total: ~2-3 days for full system