⌂ Home ☷ Board

MC-393: Google Drive & OneDrive Search Audit + Improvement Research

Date: 2026-04-08 (updated with head-to-head comparison) Status: REVIEW — findings for Elmar


Head-to-Head Comparison (all 6 methods tested)

Query: "fleet plan" and "fuel hedging" across all services.

Winner: MS Search API (Work OneDrive)

Rank Service Speed Content Snippets File Count KQL Filters Verdict
1 MS Search API (work) 687ms Yes — highlighted, 200+ char summaries with <c0> tags 231 Yes — filetype:, date ranges Best overall
2 Dropbox search_v2 439-642ms Partial — highlight_spans but often just shows title fragments 5+ (has_more) filename_only toggle Good for Dropbox files
3 GDrive fullText 704ms No — just file metadata 5 MIME type, date, owner Basic
4 Work OneDrive /drive/search 1282ms NosearchResult: {} (empty) 5 None Redundant — use MS Search instead
5 Personal OneDrive /drive/search 1187ms NosearchResult: {} (empty) 5 None Only option for personal
6 MS Search API (personal) N/A Not supported — "not supported for MSA accounts" N/A N/A Can't use

Key Insights

  1. MS Search API is dramatically better than basic OneDrive search. Same files, but adds: content summaries with highlighted terms, relevance ranking with scores, KQL query language (filetype:, date:), and returns total count (231 vs 5 per page). 2x faster too.

  2. MS Search API does NOT work for personal OneDrive accounts (MSA). Only works with work/school (Azure AD) accounts. So personal OneDrive is stuck with the basic /drive/root/search() endpoint.

  3. Dropbox highlight_spans are disappointing in practice. They claim to highlight content but often just return tokenized title fragments ("Proposed", "Fuel", "Hedging" as separate spans) instead of content snippets. Not as rich as MS Search summaries.

  4. Google Drive is the weakest — no content snippets, no ranking, and it returns folders in results (not just files). No way to get a content preview.

  5. MS Search supports KQLfiletype:pdf, date range filters, file size, author. Very powerful for targeted searches.

Recommended Search Strategy for Luci

For any file search, run these in parallel:

1. MS Search API (/search/query) — work OneDrive + SharePoint (BEST)
2. Personal OneDrive /drive/root/search() — personal files
3. Dropbox search_v2 — Dropbox files  
4. Google Drive fullText — GDrive files

Merge results, present MS Search hits first (they have the richest metadata).


Current State: What Works on Luci

Google Drive Search

Method Works Speed Searches Inside Files
gws drive files list --params '{"q":"fullText contains \'X\'"}' Yes ~1.2s Yes (Google indexes)
Google Drive Python API (via GWS creds at ~/.config/gws/credentials.json) Yes ~1.2s Yes
drive.py skill script No — needs ~/.claude/google/credentials.json (missing) - -

OneDrive / SharePoint Search

Method Works Speed Searches Inside Files
Graph API: /me/drive/root/search(q='X') Yes ~1.7s Yes (basic keyword)
MS Search API: POST /v1.0/search/query (driveItem) Yes ~0.9s Yes — with snippets/summaries
MS Search API: listItem (SharePoint) No — needs Sites.Read.All permission - -
graph_api.py search Email only — no OneDrive search command - -

File Catalog (Lucienne's local PC only)

Method Works on Luci Notes
search_files.py catalog search No~/cowork/tmp/catalog_unified.json not present 121K files, local only
LanceDB semantic search No~/cowork/tmp/file_search.lance not present Local only
Gety.ai MCP content search No — desktop app, Mac only No server/API mode

Key Findings

1. MS Search API is an untapped goldmine

The POST /v1.0/search/query endpoint returns: - Full-text content search across all OneDrive files - Hit summaries with highlighted snippets (like Google search results) - 169 results for "portfolio" in ~875ms - Already authenticated — just needs a wrapper function in graph_api.py

2. Google Drive full-text search works but lacks ranking

Google Drive's fullText contains 'X' is keyword-based only. No semantic ranking, no snippets. Still useful for finding files by content.

3. LanceDB + MiniLM is already installed on Luci

4. Gety.ai has no server mode

Mac-only desktop app. No API, no headless mode, no Linux support. Cannot replicate on Luci.


Benchmark Results

GDrive API search:    1226ms  (10 results)
OneDrive API search:  1660ms  (10 results)  
MS Search API:         875ms  (738 total, 10 shown) ← best for content search

Recommendations (prioritized)

Priority 1: Wire MS Search API into graph_api.py (effort: 1-2 hours)

Add a search-files command to graph_api.py that calls POST /v1.0/search/query with entityTypes: ['driveItem']. This gives OneDrive full-text search with snippets — for free, no new infra.

python3 graph_api.py search-files "portfolio" --top 10

Priority 2: Add GDrive search to graph_api.py or fix drive.py creds (effort: 30 min)

Either: - Add a search-gdrive command to graph_api.py using GWS creds - Or symlink/copy GWS creds to ~/.claude/google/ so drive.py works

Priority 3: Build a unified search endpoint (effort: 2-3 hours)

Create a search_all.py that queries both APIs in parallel: 1. MS Search API → OneDrive results with snippets 2. GDrive fullText → Google Drive results 3. Merge, deduplicate, present combined results

Priority 4: DIY RAG pipeline with LanceDB (effort: 1-2 days)

Build an ingestion pipeline that: 1. Pulls file metadata + first 500 tokens of content from both drives 2. Embeds with all-MiniLM-L6-v2 (already cached on Luci) 3. Stores in LanceDB (already installed) 4. Runs nightly as a scheduler task 5. Exposes search via vault MCP server

This would give true semantic/similarity search — "find documents about airline fuel costs" would match files that don't literally contain "fuel costs" but discuss related topics.

Estimated index size: 100K documents × 384 dims × 4 bytes = ~150MB. Well within budget.

Skip: Self-hosted RAG stacks (Onyx/Danswer)

These need Docker Compose with Postgres + vector DB + web server. Too heavy for 8GB with existing services (MC, Smart Money, WhatsApp, etc.) running.

Skip: Gety.ai alternatives

Mem.ai, Recall.ai are cloud SaaS — not self-hostable. No viable server-side Gety replacement exists.


Gap: What Luci Can't Do That Local PC Can

Capability Local PC (Lucienne) Luci
Search 121K files by filename Yes (catalog) No catalog — could build one
Semantic search by filename meaning Yes (LanceDB) Has LanceDB — needs index
Search inside file content Yes (Gety.ai) Yes — GDrive + OneDrive + Dropbox APIs
Search Dropbox files Yes (catalog + Gety) Yes — Dropbox API authenticated 2026-04-08
OCR / image search Yes (Gety.ai) No — could use Gemini API

UPDATE 2026-04-08: Dropbox OAuth completed. All three cloud drives now have working search on Luci: - Dropbox: search_v2 API with content highlights (~5 results per query, searches filenames + content) - Google Drive: fullText contains via GWS creds - OneDrive: MS Search API with snippets

The remaining gap is unified search (one query → all three drives) and semantic/similarity search (LanceDB pipeline).


Next Steps

Elmar to decide which priorities to pursue. Priority 1 (MS Search API) is the quickest win — 1-2 hours of work, zero new dependencies, massive improvement for OneDrive search on Luci.