MC-4845 — Benchmark local coding models on Miki

Home Board MC-4845

Benchmark local coding models on Miki

Evaluate local coding/grunt-work models on Miki (Mac mini M4 Pro, 24 GB RAM). Scope: install LM Studio/lms if possible, download practical GGUF candidates, benchmark local execu...

State Done Next Action Closed Owner Luci Runtime Closed Age 4d ago

← MC-4845

Ticket is done; runtime is closed. · profile claude_opus_1m_medium

Description

MC-4845

Evaluate local coding/grunt-work models on Miki (Mac mini M4 Pro, 24 GB RAM). Scope: install LM Studio/lms if possible, download practical GGUF candidates, benchmark local execution for coding/bash/JSON/tool-style grunt work, inspect memory/CPU behavior on Apple Silicon, then test an OpenAI-compatible endpoint path suitable for Hermes. Expected check-in: today after initial install/download/first model benchmark. Completion gate: concrete model recommendation with tested command/endpoints, observed reliability, and next steps for Hermes integration.

Activity

done

Luci is working...

Details — Done · High · Luci ▼

State

Done

Closed

State

Priority

People

Owner (assigned to)

Controller

Timing / Details

Due Date

Snooze

Source api (human)

Project General

Created 4d ago

Updated 4d ago

Advanced / Operator evidence

Routing owner

Routes via

Operator console

Evidence

Ticket is done; runtime is closed. MC is visibility-only. Hermes Luci launches and gates work outside MC, then mirrors evidence/status here.

Workflow

Start Dev Review + QA ▾

Select phases to include:

Research (scott) Implement (larry) (required) Council Review (council) Code Review (luci) Validate (tessa) (required) Sign-off (atlas)

Agents

Review Gates

Decision

WAT routing: choose an agent, review gate, or decision. Buttons use the live runtime when one is attached.

system 4d ago

[visibility-only] Queued message recorded for Hermes Luci. MC did not claim the ticket or spawn a runtime.

api 4d ago

Progress update on Miki local model benchmark: Installed LM Studio 0.4.16+2 arm64 on Miki and verified lms CLI. Miki facts verified: - Mac mini, Apple M4 Pro, 14 CPU cores, 24 GB RAM, ~531 GB free disk at start. Downloaded and tested in LM Studio CLI: - qwen2.5-coder-14b-instruct, 8.99 GB: reliable load, ~25 tok/s, no swapouts during first benchmark; best correctness so far on Python fix, but sometimes wraps JSON/code in markdown and produced one Linux-centric bash command. - qwen3-coder-30b-a3b-instruct, 14.58 GB Q3_K_L selected by LM Studio: very fast ~56-58 tok/s, but failed the simple Python bug-fix test by dropping int handling; caused swapouts, so not currently my reliability pick despite speed. - gpt-oss-20b, 11.62 GB Q4_K_S: fastest ~58-73 tok/s but emits visible <think> reasoning and failed a simple code fix; not suitable as Hermes/grunt default unless we solve reasoning suppression. - devstral-small-2507, 14.33 GB Q4_K_M: slow ~16-17 tok/s and weaker outputs; not recommended. LM Studio OpenAI-compatible server test: - Started server on 127.0.0.1:1234. - Loaded qwen2.5-coder-14b-instruct as identifier qwen25-coder-14b. - /v1/models works. - /v1/chat/completions works; simple code fix returned in ~1.1s. Current provisional recommendation: - Best reliable default today: qwen2.5-coder-14b-instruct via LM Studio server. - Continue testing qwen2.5-coder-32b-instruct Q3_K_L; download is in progress. It may beat 14B on quality if memory pressure stays acceptable. Next: wait for 32B download, benchmark it, then decide whether the default should be 14B reliable or 32B higher-quality/tighter-memory.

api 4d ago

Qwen2.5-Coder-32B benchmark completed on Miki. MODEL qwen2.5-coder-32b-instruct Swapins: 3951 Swapouts: 105480 ===RESULT bash_safe exit 1 elapsed 22.36 === ===RESULT python_fix exit 1 elapsed 1.3 === ===RESULT json_extract exit 1 elapsed 1.28 === ===RESULT grunt_reasoning exit 1 elapsed 1.09 === after_ps after_memory Swapins: 16082 Swapouts: 337520 saved /Users/elmar/miki_lms_bench_qwen25_32b_q3kl.json

luci-board-manager 4d ago

[visibility-only controller] Routed this new Luci-owned technical ticket to internal Hermes Kanban card t_1fadbd52 on board mc-internal (assignee=codexbuilder), running in isolated workspace `/home/lucienne/workspace/state/kanban-worktrees/mc-4845-miki-local-model-benchmark` (verified worker pid 2493440 cwd matches). Scope: benchmark local coding/grunt-work models on Miki, verify LM Studio/lms/OpenAI-compatible endpoint path, and report concrete model recommendations or exact human CTA if Miki access/credentials are required. MC remains the visibility ledger; no MC runtime/pickup endpoints were used.

luci-board-manager 4d ago

[visibility-only controller] Gate rejected the first internal card t_1fadbd52 as premature/research-only: it did not verify Miki reachability, install/download a runner/model, or run any benchmark. Created continuation card t_7aae15fd in the same isolated workspace and verified worker pid 2495906 cwd `/home/lucienne/workspace/state/kanban-worktrees/mc-4845-miki-local-model-benchmark`. MC remains parked on the continuation; no MC runtime/pickup endpoints were used.

luci-board-manager 4d ago

[visibility-only controller] Gate accepted internal Kanban card t_7aae15fd. Verified the evidence artifact exists at `/home/lucienne/workspace/state/kanban-worktrees/mc-4845-miki-local-model-benchmark/MC-4845-miki-local-model-benchmark.md` and includes actual Miki reachability/SSH proof, LM Studio/lms status, llama.cpp install, GGUF download, OpenAI-compatible endpoint tests, resource snapshots, tunnel test from Luci to Miki, and a concrete recommendation. Outcome: use LM Studio `qwen25-coder-14b` first via a managed SSH tunnel to Miki (`http://127.0.0.1:51234/v1`, model `qwen25-coder-14b`); keep llama.cpp Qwen2.5-Coder-3B Q5_K_M as a lightweight fallback. Closing MC-4845 as done. No MC runtime/pickup endpoints were used.

Live ▼

No activity yet

←