Date: 2026-04-08 (updated with head-to-head comparison) Status: REVIEW — findings for Elmar
Query: "fleet plan" and "fuel hedging" across all services.
| Rank | Service | Speed | Content Snippets | File Count | KQL Filters | Verdict |
|---|---|---|---|---|---|---|
| 1 | MS Search API (work) | 687ms | Yes — highlighted, 200+ char summaries with <c0> tags |
231 | Yes — filetype:, date ranges | Best overall |
| 2 | Dropbox search_v2 | 439-642ms | Partial — highlight_spans but often just shows title fragments |
5+ (has_more) | filename_only toggle | Good for Dropbox files |
| 3 | GDrive fullText | 704ms | No — just file metadata | 5 | MIME type, date, owner | Basic |
| 4 | Work OneDrive /drive/search | 1282ms | No — searchResult: {} (empty) |
5 | None | Redundant — use MS Search instead |
| 5 | Personal OneDrive /drive/search | 1187ms | No — searchResult: {} (empty) |
5 | None | Only option for personal |
| 6 | MS Search API (personal) | N/A | Not supported — "not supported for MSA accounts" | N/A | N/A | Can't use |
MS Search API is dramatically better than basic OneDrive search. Same files, but adds: content summaries with highlighted terms, relevance ranking with scores, KQL query language (filetype:, date:), and returns total count (231 vs 5 per page). 2x faster too.
MS Search API does NOT work for personal OneDrive accounts (MSA). Only works with work/school (Azure AD) accounts. So personal OneDrive is stuck with the basic /drive/root/search() endpoint.
Dropbox highlight_spans are disappointing in practice. They claim to highlight content but often just return tokenized title fragments ("Proposed", "Fuel", "Hedging" as separate spans) instead of content snippets. Not as rich as MS Search summaries.
Google Drive is the weakest — no content snippets, no ranking, and it returns folders in results (not just files). No way to get a content preview.
MS Search supports KQL — filetype:pdf, date range filters, file size, author. Very powerful for targeted searches.
For any file search, run these in parallel:
1. MS Search API (/search/query) — work OneDrive + SharePoint (BEST)
2. Personal OneDrive /drive/root/search() — personal files
3. Dropbox search_v2 — Dropbox files
4. Google Drive fullText — GDrive files
Merge results, present MS Search hits first (they have the richest metadata).
| Method | Works | Speed | Searches Inside Files |
|---|---|---|---|
gws drive files list --params '{"q":"fullText contains \'X\'"}' |
Yes | ~1.2s | Yes (Google indexes) |
Google Drive Python API (via GWS creds at ~/.config/gws/credentials.json) |
Yes | ~1.2s | Yes |
drive.py skill script |
No — needs ~/.claude/google/credentials.json (missing) |
- | - |
| Method | Works | Speed | Searches Inside Files |
|---|---|---|---|
Graph API: /me/drive/root/search(q='X') |
Yes | ~1.7s | Yes (basic keyword) |
MS Search API: POST /v1.0/search/query (driveItem) |
Yes | ~0.9s | Yes — with snippets/summaries |
| MS Search API: listItem (SharePoint) | No — needs Sites.Read.All permission |
- | - |
graph_api.py search |
Email only — no OneDrive search command | - | - |
| Method | Works on Luci | Notes |
|---|---|---|
search_files.py catalog search |
No — ~/cowork/tmp/catalog_unified.json not present |
121K files, local only |
| LanceDB semantic search | No — ~/cowork/tmp/file_search.lance not present |
Local only |
| Gety.ai MCP content search | No — desktop app, Mac only | No server/API mode |
The POST /v1.0/search/query endpoint returns:
- Full-text content search across all OneDrive files
- Hit summaries with highlighted snippets (like Google search results)
- 169 results for "portfolio" in ~875ms
- Already authenticated — just needs a wrapper function in graph_api.py
Google Drive's fullText contains 'X' is keyword-based only. No semantic ranking, no snippets. Still useful for finding files by content.
lancedb==0.30.2 installedsentence-transformers==5.3.0 installed all-MiniLM-L6-v2 model cached (~22MB, 384-dim vectors)Mac-only desktop app. No API, no headless mode, no Linux support. Cannot replicate on Luci.
GDrive API search: 1226ms (10 results)
OneDrive API search: 1660ms (10 results)
MS Search API: 875ms (738 total, 10 shown) ← best for content search
Add a search-files command to graph_api.py that calls POST /v1.0/search/query with entityTypes: ['driveItem']. This gives OneDrive full-text search with snippets — for free, no new infra.
python3 graph_api.py search-files "portfolio" --top 10
Either:
- Add a search-gdrive command to graph_api.py using GWS creds
- Or symlink/copy GWS creds to ~/.claude/google/ so drive.py works
Create a search_all.py that queries both APIs in parallel:
1. MS Search API → OneDrive results with snippets
2. GDrive fullText → Google Drive results
3. Merge, deduplicate, present combined results
Build an ingestion pipeline that:
1. Pulls file metadata + first 500 tokens of content from both drives
2. Embeds with all-MiniLM-L6-v2 (already cached on Luci)
3. Stores in LanceDB (already installed)
4. Runs nightly as a scheduler task
5. Exposes search via vault MCP server
This would give true semantic/similarity search — "find documents about airline fuel costs" would match files that don't literally contain "fuel costs" but discuss related topics.
Estimated index size: 100K documents × 384 dims × 4 bytes = ~150MB. Well within budget.
These need Docker Compose with Postgres + vector DB + web server. Too heavy for 8GB with existing services (MC, Smart Money, WhatsApp, etc.) running.
Mem.ai, Recall.ai are cloud SaaS — not self-hostable. No viable server-side Gety replacement exists.
| Capability | Local PC (Lucienne) | Luci |
|---|---|---|
| Search 121K files by filename | Yes (catalog) | No catalog — could build one |
| Semantic search by filename meaning | Yes (LanceDB) | Has LanceDB — needs index |
| Search inside file content | Yes (Gety.ai) | Yes — GDrive + OneDrive + Dropbox APIs |
| Search Dropbox files | Yes (catalog + Gety) | Yes — Dropbox API authenticated 2026-04-08 |
| OCR / image search | Yes (Gety.ai) | No — could use Gemini API |
UPDATE 2026-04-08: Dropbox OAuth completed. All three cloud drives now have working search on Luci:
- Dropbox: search_v2 API with content highlights (~5 results per query, searches filenames + content)
- Google Drive: fullText contains via GWS creds
- OneDrive: MS Search API with snippets
The remaining gap is unified search (one query → all three drives) and semantic/similarity search (LanceDB pipeline).
Elmar to decide which priorities to pursue. Priority 1 (MS Search API) is the quickest win — 1-2 hours of work, zero new dependencies, massive improvement for OneDrive search on Luci.