agent-watch — week 2026-W23

Rolling weekly report. New releases append below as detected.

Week label: 2026-W23 First detection: 2026-06-01T05:00:44.771146+00:00

openclaw v2026.5.31-beta.4 — 2026-06-01T04:15:10Z

Repo: https://github.com/OpenClaw/openclaw Release: https://github.com/openclaw/openclaw/releases/tag/v2026.5.31-beta.4 Detected: 2026-06-01T05:00:44.771146+00:00 Borrowable ideas: 2

OpenClaw v2026.5.31-beta.4 — Release Review for Luci

Executive Summary

Tool recovery + session stability — interrupted tool calls, stale bindings, media retries now recover cleanly; directly applicable to mc_pickup worker resilience
Provider/request timeouts — better bounds on OAuth, media, service probes; useful for scheduler.py provider injection
Skills/plugin stale-snapshot handling — cleaner disabled-skill recovery; marginal win for skill-evolver

Nothing mission-critical for current Luci setup, but three incremental hardening items worth backporting.

Features Worth Borrowing

Feature	OpenClaw Purpose	Luci Relevance	Action
Tool call recovery (#88129, #88136, #88141, #88162, #88182)	Agents resume from interrupted tool calls, stale session bindings, compaction handoffs	mc_pickup workers spawn independent CLI sessions; interrupted tool calls (API timeout, network drop, daemon stale) currently have no recovery path. Worker can hang or emit partial state → zombie MC ticket. Recovery logic would reduce hung-worker incidents.	Yes. Backport session-recovery pattern into mc_pickup worker subprocess wrap (test via network interrupt scenario)
Request timeout bounding (provider, OAuth, media, service probes)	Cap lifetimes before they hang a run	Scheduler.py calls Anthropic SDK + provider-profile injection; no explicit timeout per request. Slow provider (GLM API timeout, Kimi latency spike) hangs task runner → blocks tick scheduler heartbeat. Better timeouts = predictable failure + retry.	Yes. Add per-provider request timeout config + fallback to main-loop tick timeout; prevents scheduler cascade.
Stale disabled-skill snapshot handling (#79072, #79173)	Plugin/skill loaders handle disabled snapshots, emit recovery guidance	Luci skill-evolver auto-creates/disables skills; stale disabled snapshots (disabled skill re-enabled mid-session, loader caches old disabled state) can silently drop skills. Clearer recovery = better visibility into skill state churn.	Maybe. Audit skill-evolver for snapshot stale risk; low priority (rare in practice). Skip unless skill-evolver audit surfaces a gap.

Not Relevant

Multi-channel delivery (Slack, Discord, Teams, WhatsApp, iMessage) — Luci is single-channel (Telegram via ccgram); no expansion planned
Workboard, enterprise orchestration — Luci's "orchestration" is ticket dispatch + scheduler + worker subprocesses; workboard is multi-team/federation feature
iOS push relay, external Copilot packaging — server-side infrastructure not applicable to hosted hetzner box
SecretRef plugin manifests — credential templating; Luci uses vault.db for secrets, not plugin manifests

Questions for Elmar

Tool recovery: Worth a backport now, or acceptable to wait for next major upgrade cycle?
Request timeouts: Is scheduler.py experiencing timeout cascades, or is this preventive hardening?

openclaw v2026.6.2-alpha.1 — 2026-06-02T02:56:29Z

Repo: https://github.com/OpenClaw/openclaw Release: https://github.com/openclaw/openclaw/releases/tag/v2026.6.2-alpha.1 Detected: 2026-06-02T05:00:53.630833+00:00 Borrowable ideas: 0

openclaw 2026.6.2-alpha.1 Review for Luci

Executive Summary

Worker/session recovery: Agents recover more cleanly from interrupted tool calls and stale bindings — benefits Luci's background mc_pickup workers passively, no action needed.
Provider timeout bounding: Requests now bound on OAuth lifetimes, media downloads, polling retries — improves scheduler task reliability passively.
Skill loading diagnostics: Better error handling on stale/disabled plugin snapshots — helps auto-skill-evolver's recovery path.

Features New to openclaw — Load-Bearing Only

Feature	Relevance to Luci	Rationale
Worker recovery from tool interrupts/stale bindings	Passive benefit	mc_pickup spawns background workers; cleaner recovery from stale sessions = fewer silent task failures. Automatic as base improves.
Provider request timeout bounds	Passive benefit	scheduler.py injects provider env per runtime_profile; unbounded OAuth/media/polling can stall tasks. Bounds reduce hanging tasks.
Skill/plugin loading error clarity	Passive benefit	~/.claude/skills/ + auto-skill-evolver inherit clearer diagnostics on disabled snapshots, plugin load failures. Better visibility on skill-refresh errors.
SQLite-backed plugin install ledger	Passive benefit	Reduces filesystem scanning on restarts; may help skill-evolver detect stale plugin state faster. Secondary.
Workboard orchestration primitives	Watch, not adopt yet	New multi-agent coordination surface. If it's an alternative to MC tickets for agent routing, could replace/augment mc_pickup. Needs docs to assess.
Skill Workshop proposal review UI	Watch, not adopt yet	Human-driven skill review (propose → approve → ship). auto-skill-evolver already generates proposals; Skill Workshop could streamline the review gate. Not urgent; verify it doesn't duplicate auto-evolver.

Skip

Tokenjuice, Copilot plugin externalizations — not in Luci's stack.
iOS push relay, iMessage state, realtime Talk — mobile channels, irrelevant to ccgram/Telegram/CLI dispatch.
Workboard SecretRef manifests, external packaging — enterprise infra, not relevant to single-tenant MC.
Chat/Control UI polish (history loading, markdown streaming, draft caching) — not used by Luci's headless scheduler/worker dispatch.

Verdict

No action required. All three load-bearing improvements are passive base-system gains. Luci's mc_pickup, scheduler, and skill-evolver inherit them automatically. Watch Workboard and Skill Workshop when docs land; they may become relevant for future orchestration redesigns, but not now.

openclaw v2026.6.3-alpha.1 — 2026-06-03T04:06:15Z

Repo: https://github.com/OpenClaw/openclaw Release: https://github.com/openclaw/openclaw/releases/tag/v2026.6.3-alpha.1 Detected: 2026-06-03T05:00:46.155863+00:00 Borrowable ideas: 3

OpenClaw 2026.6.3-alpha.1 Review for Luci

Executive Summary

Tool-call recovery + session binding cleanup reduces churn when workers (mc_pickup, scheduler tasks) hit network dropouts or stale bindings
Timeout/retry bounding across OAuth, polling, media prevents the hangs that plague long-running jobs
Everything else is enterprise governance, mobile delivery, or UI polish — not applicable to a single-orchestrator setup

Features Worth Borrowing

Feature	Relevance to MC/Luci	One-liner
Agent tool-call recovery (#88129, #88136, #88141)	MC workers (mc_pickup) + scheduler task runtimes spawn agents; cleaner binding recovery = fewer orphan sessions	Adopt: add explicit session recovery on network dropout in worker bootstrap
Timer/retry bounds for OAuth, polling, media	Scheduler tasks & mc_pickup workers can hang waiting on Anthropic API, GCS media, or Google OAuth; explicit caps prevent stalls	Adopt: cap all external calls in scheduler.py + mc_pickup.py with timeouts; audit `claude` CLI invocation patterns
Provider model metadata caching (OpenRouter, Copilot)	Runtime profile routing (runtime_profile field) picks Anthropic/GLM/Kimi/MiniMax; model catalog + caching tightens cost tracking	Adopt: cache provider models in vault.db (schema new `provider_models` table), refresh on scheduler heartbeat, use in runtime-profile audit
Disabled-snapshot handling for skill loading	Auto-skill-evolver creates skills; stale disabled snapshots = unclear recovery	Skip: auto-skill-evolver is lightweight; not worth the governance overhead yet
Skill Workshop proposal flow	Guarded skill creation with review states, versioned proposals, rollback metadata	Skip: auto-skill-evolver auto-triggers on repeat patterns; proposal review is overkill for Luci's use case
SQLite state migration (iMessage ledgers, plugin installs)	Mirrors Luci's vault.db strategy; pattern is sound, already in use	Neutral: Luci already uses SQLite for activity_log + scheduled-task state

Skip

Multi-team/workboard orchestration — Luci is the orchestrator
Copilot/GitHub agent plugin — out of scope
iMessage/iOS/WhatsApp/Teams delivery — Luci owns Telegram (ccgram) only
Billing, localization, UI polish

Action Items

If the overhead is trivial: - Add session-recovery guard to mc_pickup.py worker spawn (catch stale bindings on failed tool calls, re-attach) - Add timeout caps to scheduler.py task execution (prevent hangs on Anthropic API, GCS, OAuth)

If worth the complexity: - Extend vault.db schema with provider model metadata table; refresh on heartbeat

Otherwise: pattern catalogue only (reference for next reliability crisis).

openclaw v2026.6.4-alpha.1 — 2026-06-04T03:40:03Z

Repo: https://github.com/OpenClaw/openclaw Release: https://github.com/openclaw/openclaw/releases/tag/v2026.6.4-alpha.1 Detected: 2026-06-04T05:00:40.431690+00:00 Borrowable ideas: 4

openclaw v2026.6.4-alpha.1 Review for Luci

Executive Summary

Install policy replaces dangerous-code scanner with operator gates + doctor checks — safer skill auto-updates
Telegram hardening adds admin-rights enforcement, DM exec allowlists, transcript durability — ccgram defensive upgrades
Provider fanout for custom runtimes + bundled aliases — scheduler cost optimization for parallel subagent council

Features Worth Borrowing into MC/Luci

Feature	Relevance	Rationale
Skill install operator policy + doctor	✅ Yes	auto-skill-evolver currently installs without gates. Operator policy + doctor checks reduce supply-chain risk (MC-4657 hardening precedent).
Telegram admin writeback + DM exec allowlists	✅ Yes	ccgram owns polling; approval gates + admin checks harden Telegram control plane (Telegram lock critical per CLAUDE.md).
Provider runtime fanout	✅ Yes	scheduler already routes via `runtime_profile`; bundled aliases + custom fanout reduce model-spawn overhead when dev-loop spawns council (Opus/Sonnet/Gemini/Kimi/GLM in parallel).
Session write-lock recovery	✅ Yes	defensive. mc_pickup dispatcher spawns per-ticket workers; lock-release failures could block ticket dispatch. Low-cost hardening.
Streaming text visibility + ACK reconcile	⚠️ Maybe	MC's SSE broadcaster already streams task output. ACK timing metadata is nice-to-have for debugging UI/state races, not blocking.
Prompt cache boundaries	❓ Defer	Luci/workers don't use caching yet. Worth scanning if cost-optimization becomes priority.

Skip

Multi-team org, enterprise billing, ClawHub marketplace — Luci is single-tenant Hetzner-only
iOS/Android/Windows release automation, Matrix plugin specifics — not applicable
Docs/UI polish — Luci is Flask CLI + web board, not consumer product

Action

Install policy for skills — open ticket to backport operator gates into auto-skill-evolver (model: install-policy enum, doctor checks on skill source)
Telegram approval gates — vet ccgram's DM allowlist pattern, surface to Elmar for config (who can exec Telegram commands?)
Provider fanout — review scheduler's runtime_profile injection; add bundled aliases table if spawning 5+ model council votes gets common

goose v1.37.0 — 2026-06-03T19:46:33Z

Repo: https://github.com/block/goose Release: https://github.com/aaif-goose/goose/releases/tag/v1.37.0 Detected: 2026-06-04T05:01:26.886354+00:00 Borrowable ideas: 4

Goose v1.37.0 — Luci Release Review

Executive Summary

3 backport candidates: PreToolUse denial hooks (enforce policy gates), configurable tool output limits (prevent token waste), provider model exposure for dynamic council dispatch
Declarative provider system: Reduces hardcoding in scheduler task routing, but requires schema migration
Not actionable: 80% of release is ACP server (web API), TUI, and localization — Luci is headless orchestrator on MC dashboard + Telegram

Load-Bearing Features Worth Borrowing

Feature	Luci System	Rationale	Effort
PreToolUse denial hooks (#9304)	dev-loop / scheduler	Luci already blocks Edit/Write/Bash before dev-loop via `require-dev-loop.sh`. Goose's hook pattern could replace or strengthen that gate.	Low — inspect goose implementation, backport pattern if cleaner
GOOSE_MAX_TOOL_RESPONSE_SIZE (#9256)	context-mode sandbox	Luci's ctx_execute already contains large output, but global limit prevents workers from accidentally flooding context. Forward-compatible with vault.db query results.	Low — add env var to scheduler + MC worker spawn
Provider model exposure + ACP system prompt setter (#9475, #9478)	scheduler runtime_profile	Luci dispatches multi-model council (Opus/Sonnet/Codex/Gemini/Kimi/GLM) hardcoded in tasks. Exposing raw models + per-session system-prompt could let scheduler dynamically route by model capability (e.g., "use Opus for council, Sonnet for Tier 1 fix").	Medium — schema change to task runner, update mc_pickup provider injection
Declarative provider system (Perplexity, Alibaba, Databricks, etc.) (#9443, #9254, #9274)	scheduler task routing	Currently scheduler hardcodes provider→env mapping. Goose's declarative shape (name, endpoint, auth-field) could replace scheduler's `_apply_provider_profile_env()` with a config table, reducing code + enabling new providers without redeploy.	Medium-to-High — requires task/provider schema migration, test coverage

Not Applicable (Skip)

ACP server features (session system prompt, session list pagination, pass cwd, replay images, slash commands) — Luci is headless orchestrator; ACP is Goose's web API. Use MC dashboard instead.
TUI & localization — server agent, no desktop needed. Russian/Turkish irrelevant to ops.
Harbor eval runner — evaluation infra, not operational.
Goose review / goose://deep-links — Luci uses review skill + MC tickets, not Goose CLI.
/goal command, Summon subagent — Goose internals; Luci uses dev-loop tier routing + explicit task gates.
xAI SuperGrok, Opt-in leaderboard — subscription providers. Interesting for diversity, but low priority vs. existing zoo.

Recommendation

Backport PreToolUse denial hooks + MAX_TOOL_RESPONSE_SIZE immediately (both low-risk, high-safety gain).

Defer provider model exposure + declarative system to a follow-up ticket: useful for council dispatch tuning + onboarding new models, but requires schema planning. Not blocking.

Open ticket for auto-skill-evolver to adopt Goose's operator install-policy pattern (2026-06-02 context shows this was already identified as backport candidate from prior release review).

openclaw v2026.6.5-alpha.1 — 2026-06-05T04:03:44Z

Repo: https://github.com/OpenClaw/openclaw Release: https://github.com/openclaw/openclaw/releases/tag/v2026.6.5-alpha.1 Detected: 2026-06-05T05:00:44.772406+00:00 Borrowable ideas: 4

openclaw v2026.6.5-alpha.1 Release Review

Executive Summary

Telegram safety hardening (admin writeback, DM approval allowlists, poll modifiers) directly improves ccgram.service stability — MC-2617's single-poller lock now has stricter boundaries
Durable sends on transcript mirror failure — outbound path resilience for Telegram when logging systems fail (currently a single mirror error kills the send)
Custom-provider runtime fanout (Gemini stop sequences, Kimi cache markers) enables tighter control over GLM/Kimi/MiniMax routing than current CLI string injection

Features Worth Borrowing

Feature	Load-Bearing for Luci
Telegram admin writeback + DM approval allowlists	ccgram.service is sole Telegram poller; new role verification + exec approval gates tighten MC ticket approval routing. Check if ccgram enforces these.
Durable sends on transcript mirror failure	Currently ccgram fails the entire Telegram send if transcript mirroring (e.g., to audit log) fails. This decouples mirror reliability from send delivery.
Custom-provider runtime fanout	scheduler.py injects `runtime_profile` as env string (anthropic/glm/kimi/minimax); openclaw now routes context-aware. Could replace string-injection with structured dispatch for GLM/Kimi/MiniMax cost control.
Policy load-time rejection (corrupt shells, unsupported keys, unsafe exec)	If tighter policy enforcement, MC ticket workers + skill-loader may hit new rejections. Verify no false positives on existing worker spawns.

Skip

Discord/Feishu/WhatsApp (Telegram-only setup on Luci)
Workboard/Android/Chat UI (headless)
Plugin install policy shift (affects skill-loader only; low relevance if Claude Code core unchanged)
Windows installer, docs, E2E polish

Recommendation

Medium priority backport: Telegram safety gates + durable sends. Low priority: custom-provider fanout (forward-looking for cost scaling). Test on ccgram.service next scheduler run and report any admin-role enforcement side effects.