Current email archival has a critical gap: we don't save the original email. We store: - Truncated body text (8KB) in email.db - Attachments separately on OneDrive - MD envelope metadata
The full original email lives only on Microsoft's servers. If access breaks, everything > 8KB is gone.
Elmar wants something like MailStore — a sealed container per email with full search and attachment retrieval. The Unix-standard approach is .eml files (RFC 5322 MIME format) stored in Maildir, indexed by mu.
Phase 1 (Immediate - Day 1): Add .eml download to existing sync pipeline
Phase 2 (Day 2): Install mu, set up Maildir, enable indexing
Phase 3 (Day 3-4): Build simple web UI for search
Phase 4 (Later): Migrate existing emails, decommission old pipeline
| Question | Decision | Rationale |
|---|---|---|
| Where to store .eml? | OneDrive via rclone mount (appears as ~/Maildir/) |
50GB+ email archive lives in cloud, Luci disk safe |
| Keep OneDrive attachments? | Phase out after .eml verified | Attachments inside .eml = no duplicate storage needed |
| Keep email.db? | Yes, for metadata queries | Still useful for project/sender lookups, mu for full-text |
| Web UI now or later? | Later (Day 3-4) | CLI search works immediately, UI is polish |
| Graph API for .eml? | Yes, $value endpoint |
GET /me/messages/{id}/$value returns raw MIME |
| Search: mu vs GBrain? | Hybrid | mu = field queries (from:date), GBrain = semantic ("emails about fleet") |
| Bulk export? | Mac/Windows Outlook for initial dump | Faster than Graph API for 10-year archive, then transfer to Luci |
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Outlook M365 │────▶│ Graph API │────▶│ .eml files │
│ (source) │ │ (download) │ │ (OneDrive) │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
│
rclone mount │
▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Web UI │────▶│ mu index │────▶│ ~/Maildir/ │
│ (search) │ │ (~/.mu database)│ │ (local mount) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ GBrain │
│ (semantic) │
└──────────────────┘
Storage: .eml files live in OneDrive cloud (50GB+ fine)
Index: mu database at ~/.mu/ (~5% of email volume = ~2.5GB for 50GB archive)
Access: rclone mount streams files on-demand when opened
# Install rclone
curl https://rclone.org/install.sh | sudo bash
# Configure OneDrive (interactive)
rclone config
# Name: onedrive
# Type: onedrive
# Follow browser auth flow
# Create mount point
mkdir -p ~/Maildir
# Test mount
rclone mount onedrive:EmailArchive ~/Maildir --daemon --vfs-cache-mode full
# Add to systemd for persistence
sudo nano /etc/systemd/system/rclone-onedrive.service
systemd service:
[Unit]
Description=rclone OneDrive mount
After=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/rclone mount onedrive:EmailArchive /home/lucienne/Maildir \
--config /home/lucienne/.config/rclone/rclone.conf \
--vfs-cache-mode full \
--daemon
Restart=on-failure
[Install]
WantedBy=multi-user.target
On Mac/Windows (faster for initial dump):
- Mac: Outlook → Export → .olm file, OR mbsync with IMAP
- Windows: Outlook → Export → .pst file, then convert to .eml
Transfer to Luci:
# From Mac to Luci
rsync -av ~/EmailArchive/ lucienne@100.118.207.3:~/Maildir/
# Or upload to OneDrive directly, let rclone sync down
1. ~/workspace/scripts/graph_api.py
- Add new command: download-message <message-id> --output <path>
- Uses Graph API endpoint: GET /me/messages/{id}/$value
- Returns raw MIME content (complete .eml with headers + body + attachments)
def api_download_message(message_id: str, output_path: str) -> None:
"""Download raw MIME content (.eml) for a message."""
url = f"{GRAPH_BASE}/me/messages/{message_id}/$value"
token = get_token()
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
with urllib.request.urlopen(req) as response:
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
with open(output_path, "wb") as f:
f.write(response.read())
Add argparse entry:
subparsers = parser.add_subparsers(dest='command', help='Graph API commands')
dl_parser = subparsers.add_parser('download-message', help='Download raw .eml')
dl_parser.add_argument('message_id', help='Graph API message ID')
dl_parser.add_argument('--output', required=True, help='Output .eml path')
2. ~/workspace/scripts/email_attachment_sync.py
- Add --save-eml flag
- After fetching message body, download .eml to Maildir structure
- Maildir layout (via rclone mount to OneDrive):
~/Maildir/ # rclone mount point
├── cur/ # Read emails
│ └── {graph_id}.eml # Actual files in OneDrive cloud
├── new/ # Unread emails
└── tmp/ # During delivery
- Filename: {graph_message_id}.eml (unique, reversible)
- If email has attachments, they're inside the .eml (MIME-encoded)
3. ~/workspace/data/email.db
- Add column: eml_path TEXT — path to .eml file in Maildir
- Update INSERT/UPDATE statements to include this path
# 1. Verify rclone mount
df -h | grep Maildir
ls ~/Maildir/ # Should show cur/, new/, tmp/
# 2. Test download for one message
python3 graph_api.py download-message <msg-id> --output ~/Maildir/cur/test.eml
# 3. Verify it's a valid .eml (has headers + body + attachments)
file ~/Maildir/cur/test.eml
# Should show: RFC 822 mail text or "message/rfc822"
# 4. Check attachments are inside
munpack -t ~/Maildir/cur/test.eml
# 5. Verify file sync'd to OneDrive
# Check OneDrive web interface or:
rclone ls onedrive:EmailArchive/cur/
sudo apt install mu maildir-utils munpack
mu init --maildir=~/Maildir --personal-address=elmar@conradie.za
mu index
~/.mu/ (Xapian database)Add to ~/workspace/tasks/email-index.task:
schedule: "*/15 * * * *" # Every 15 minutes
command: "mu index --quiet"
# Find emails from Stephan
mu find from:stephan
# Find emails with attachments about "fleet"
mu find fleet flag:attach
# Find emails last 3 months
mu find date:3m..
# Output JSON for web UI
mu find from:elmar --format=json --maxnum=100
| Use case | Tool | Example |
|---|---|---|
| Field queries | mu | mu find from:stephan date:2024-01.. |
| Attachment search | mu | mu find flag:attach mime:application/pdf |
| Semantic search | GBrain | "emails discussing fleet contracts Q3" |
| Project context | GBrain | "what did Stephan send about Heron?" |
Workflow: GBrain finds email semantically → returns graph_message_id → mu fetches full .eml
# Index should build in <2 min for 1000 emails
time mu index
# Search should be <100ms
time mu find from:stephan
# Check database stats
du -sh ~/.mu/ # Should be ~5% of email archive size
New file: ~/workspace/scripts/email_search_server.py
Features: - Search bar at top (mu query syntax) - Results table: date, sender, subject, attachments (yes/no) - Click row → show email body + attachments list - Download attachment button (extracts from .eml on-demand)
Routes:
- GET / — search form
- GET /api/search?q=<query> — mu find --format=json
- GET /api/view/<message-id> — mu view <msg-id> + parse attachments
- GET /api/attach/<message-id>/<part-id> — extract attachment from .eml
Add email search widget to MC dashboard at /email-search route.
# Start server
python3 email_search_server.py --port 3020
# Test search
curl "http://localhost:3020/api/search?q=from:stephan"
# Test view
curl "http://localhost:3020/api/view/<msg-id>"
$valueeml_path| File | Purpose |
|---|---|
/etc/systemd/system/rclone-onedrive.service |
rclone mount persistence (create new) |
~/.config/rclone/rclone.conf |
OneDrive credentials for rclone |
~/workspace/scripts/graph_api.py |
Graph API client — add download-message command |
~/workspace/scripts/email_attachment_sync.py |
Main sync pipeline — add --save-eml flag |
~/workspace/data/email.db |
Metadata DB — add eml_path column |
~/Maildir/ |
rclone mount point (OneDrive appears local) |
~/.mu/ |
mu's Xapian index database (~5% of archive size) |
~/workspace/tasks/email-index.task |
Scheduled mu indexing (create new) |
Download raw MIME (.eml):
GET /me/messages/{message-id}/$value
Authorization: Bearer {token}
Response: Raw RFC 5322 message (complete .eml file)
Existing scopes cover this — no new permissions needed:
- Mail.ReadWrite includes reading message content
| Risk | Mitigation |
|---|---|
| Storage: 50GB+ emails | Lives in OneDrive cloud, not Luci disk. rclone streams on-demand. |
| mu database size | ~5% of archive (2.5GB for 50GB). Totally fine on Luci. |
| rclone mount fails | systemd auto-restart. Email access requires network (expected). |
| Indexing time | mu indexes ~1000 emails/sec. Full 10-year archive <5 min. |
| Graph API rate limits | Download is per-message, same as current attachment fetch. |
| Network latency | VFS cache in rclone (--vfs-cache-mode full) caches accessed files locally. |
| Search: semantic vs keyword | Hybrid: GBrain for semantic, mu for field queries. |
| mu not installed | Simple apt install mu maildir-utils. Well-maintained package. |
| OneDrive auth expires | rclone handles refresh. Token stored in ~/.config/rclone/rclone.conf. |
.eml files downloading and saving to Maildirmu index completes without errorsmu find returns relevant results for test queries| Phase | Time |
|---|---|
| Phase 1 (.eml download) | 2-4 hours |
| Phase 2 (mu install + index) | 1-2 hours |
| Phase 3 (web UI) | 6-10 hours |
| Phase 4 (migration) | 2-4 hours |
Total: ~2-3 days for full system