Ticket is done; runtime is closed.·profile claude_opus_1m_medium
Description
MC-4845
Evaluate local coding/grunt-work models on Miki (Mac mini M4 Pro, 24 GB RAM). Scope: install LM Studio/lms if possible, download practical GGUF candidates, benchmark local execution for coding/bash/JSON/tool-style grunt work, inspect memory/CPU behavior on Apple Silicon, then test an OpenAI-compatible endpoint path suitable for Hermes. Expected check-in: today after initial install/download/first model benchmark. Completion gate: concrete model recommendation with tested command/endpoints, observed reliability, and next steps for Hermes integration.
Activity
done
INTERACTIVE
Luci is working...
Details —
Done
· High
· Luci
▼
SState
Done
Closed
PPeople
TTiming / Details▼
api (human)
General
4d ago
4d ago
Advanced / Operator evidence
RRouting owner
ROperator console
Ticket is done; runtime is closed.MC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.
WWorkflow
Start Dev Review + QA ▾
Select phases to include:
Agents
Review Gates
Decision
WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.
[visibility-only] Queued message recorded for Hermes Luci. MC did not claim the ticket or spawn a runtime.
api4d ago
Progress update on Miki local model benchmark:
Installed LM Studio 0.4.16+2 arm64 on Miki and verified lms CLI.
Miki facts verified:
- Mac mini, Apple M4 Pro, 14 CPU cores, 24 GB RAM, ~531 GB free disk at start.
Downloaded and tested in LM Studio CLI:
- qwen2.5-coder-14b-instruct, 8.99 GB: reliable load, ~25 tok/s, no swapouts during first benchmark; best correctness so far on Python fix, but sometimes wraps JSON/code in markdown and produced one Linux-centric bash command.
- qwen3-coder-30b-a3b-instruct, 14.58 GB Q3_K_L selected by LM Studio: very fast ~56-58 tok/s, but failed the simple Python bug-fix test by dropping int handling; caused swapouts, so not currently my reliability pick despite speed.
- gpt-oss-20b, 11.62 GB Q4_K_S: fastest ~58-73 tok/s but emits visible <think> reasoning and failed a simple code fix; not suitable as Hermes/grunt default unless we solve reasoning suppression.
- devstral-small-2507, 14.33 GB Q4_K_M: slow ~16-17 tok/s and weaker outputs; not recommended.
LM Studio OpenAI-compatible server test:
- Started server on 127.0.0.1:1234.
- Loaded qwen2.5-coder-14b-instruct as identifier qwen25-coder-14b.
- /v1/models works.
- /v1/chat/completions works; simple code fix returned in ~1.1s.
Current provisional recommendation:
- Best reliable default today: qwen2.5-coder-14b-instruct via LM Studio server.
- Continue testing qwen2.5-coder-32b-instruct Q3_K_L; download is in progress. It may beat 14B on quality if memory pressure stays acceptable.
Next: wait for 32B download, benchmark it, then decide whether the default should be 14B reliable or 32B higher-quality/tighter-memory.
[visibility-only controller] Routed this new Luci-owned technical ticket to internal Hermes Kanban card t_1fadbd52 on board mc-internal (assignee=codexbuilder), running in isolated workspace `/home/lucienne/workspace/state/kanban-worktrees/mc-4845-miki-local-model-benchmark` (verified worker pid 2493440 cwd matches). Scope: benchmark local coding/grunt-work models on Miki, verify LM Studio/lms/OpenAI-compatible endpoint path, and report concrete model recommendations or exact human CTA if Miki access/credentials are required. MC remains the visibility ledger; no MC runtime/pickup endpoints were used.
luci-board-manager4d ago
[visibility-only controller] Gate rejected the first internal card t_1fadbd52 as premature/research-only: it did not verify Miki reachability, install/download a runner/model, or run any benchmark. Created continuation card t_7aae15fd in the same isolated workspace and verified worker pid 2495906 cwd `/home/lucienne/workspace/state/kanban-worktrees/mc-4845-miki-local-model-benchmark`. MC remains parked on the continuation; no MC runtime/pickup endpoints were used.
luci-board-manager4d ago
[visibility-only controller] Gate accepted internal Kanban card t_7aae15fd. Verified the evidence artifact exists at `/home/lucienne/workspace/state/kanban-worktrees/mc-4845-miki-local-model-benchmark/MC-4845-miki-local-model-benchmark.md` and includes actual Miki reachability/SSH proof, LM Studio/lms status, llama.cpp install, GGUF download, OpenAI-compatible endpoint tests, resource snapshots, tunnel test from Luci to Miki, and a concrete recommendation. Outcome: use LM Studio `qwen25-coder-14b` first via a managed SSH tunnel to Miki (`http://127.0.0.1:51234/v1`, model `qwen25-coder-14b`); keep llama.cpp Qwen2.5-Coder-3B Q5_K_M as a lightweight fallback. Closing MC-4845 as done. No MC runtime/pickup endpoints were used.