Skill guide • Claude Code

Tester Panel — your on-demand usability jury

Point it at any dashboard, app or website and it spins up a panel of five (or six) tester-user agents — each wearing a different hat — runs them in parallel, then folds everything into one prioritised roadmap and ships the obvious quick wins. This is the visual walkthrough: what it does, how to run it, and exactly what to expect back.

✓ Installed: ~/.claude/skills/tester-panel/ Invoke: /tester-panel <project> 2 waves · 5–8 agents · 1 roadmap Review-only by default

What it is in one breath

Mental model

Think of it as hiring a small focus group of expert testers for an hour — except they all work at once, finish in minutes, and hand you a single agreed list instead of five conflicting opinions. Each tester only looks; none of them touch the code. After they report, you (with their findings) decide what to fix, and the unambiguous quick wins get shipped on the spot.

It finds problems. Confusing screens, missing labels, jargon, dead buttons, bloat, wrong numbers.

A gate keeps them out. Tester Panel is the broad discovery sweep; your accuracy/regression tests are the narrow guard. Different jobs — run both.

Core tester lenses, run in parallel

Lenses if the project shows numbers (+verifier)

Waves: panel → verify & produce

Prioritised roadmap you actually act on

How to run it

Three ways in

All three land in the same place. The skill auto-triggers on phrases like “test the project”, “run the tester panel”, “is this user-friendly”, “find improvements”, or “audit the dashboard”. You don't have to remember the exact command.

The workflow at a glance

Panel → verify → consolidate → ship

Both waves fan out in parallel — five-plus agents work simultaneously, so the whole sweep costs minutes, not a serial slog. Everything funnels into one consolidated list at the end.

The panel — six hats, one tap to explore

Click a card

🧊

① Cold new user

Knows nothing. Is it self-explanatory?

🎯

② Decision-maker

The senior it's built for. Is it useful?

⚡

③ Power user

Daily-use friendliness & control.

🔍

④ UX sweep

Every screen, every state.

🧱

⑤ IA skeptic

Bloat, redundancy, what to cut.

🧮

⑥ Accuracy verifier

Only if it shows numbers.

What you actually get back

Outputs

1 · A prioritised punch list per tester

Each agent returns a tight list in one fixed shape, under 500 words, so they stack cleanly:

[HIGH|MED|LOW] — area — finding — concrete fix

Plus a per-screen verdict: “would anyone use this, and what decision does it drive?”

2 · One consolidated roadmap

Everything de-duplicated and split into two buckets:

SHIP NOW unambiguous, low-risk quick wins
DECIDE naming, restructure, big features — your call

Anything that clashes with a past instruction is surfaced as a question, never silently overridden.

Example consolidated output

Pri	Area	Finding	Fix
HIGH	Search	Search box renders but does nothing	Wire the input handler / remove it
HIGH	Orientation	No per-tab description; new users lost	Add a one-line banner per tab
MED	IA	22 tabs, much overlap	Consolidate to ~12 (decision)
MED	Scroll	Header + first column scroll away	Make them sticky
LOW	Numbers	“Anomaly” was an incomplete period	Add an incomplete-period flag

Then it ships the quick wins. Unambiguous high-value, low-risk fixes (descriptions, sticky headers, confirmed-bug fixes, search) get implemented and verified in a real browser at full viewport — and if you have a regression gate, it runs before reporting done.

Step by step, what happens

Expand each

Step 0 — Establish context

States what the thing is, who uses it, where the source & built files are, how to view it live (path + URL/port), and how agents should inspect it — read source and rendered output, in a real browser at 1440×900. If there's a known set of screens, it lists them so every tester starts from the same map.

Step 1 — Wave 1: the panel (parallel)

Five review-only agents launch at once, each with a self-contained brief and its own lens. None edit code. Each returns the [HIGH|MED|LOW] — area — finding — fix list plus per-screen verdicts. A sixth accuracy verifier joins if the project shows numbers.

Step 2 — Wave 2: verify & produce (parallel)

After consolidating Wave 1, focused agents (a) definitively verify any flagged accuracy/functional bug — real numbers, root cause, file:line, and a verdict (real bug vs data artefact vs misread) — and (b) produce ready-to-paste deliverables: exact per-screen copy, glossary/tooltip text, a proposed leaner structure.

Step 3 — Consolidate

You get one prioritised roadmap: quick wins shippable now vs decisions needed. Conflicts with prior instructions are raised as explicit choices, with concrete options.

Step 4 — Implement the quick wins

The unambiguous, low-risk wins get built and verified in a real browser; the regression gate (if any) runs before “done”. In PKA this implementation pass routes through dev-loop, per the house rule that all code changes go through it — the audit waves stay direct.

When to reach for it

Fit / not-fit

✓ Great fit

You just built or reshaped a dashboard / report / app / site.
“Is this actually usable? What am I missing?”
It feels bloated and you want an outside eye on structure.
Numbers are shown and you want a forensic tie-out.
Before showing something to a senior stakeholder.

✗ Reach for something else

A specific known bug → just fix it (or dev-loop).
Automated accuracy/regression guarding → that's a gate, not this discovery sweep.
Pure visual QA of one page → the tessa agent.
Building the thing in the first place → design/build skills first, panel after.

Notes & FAQ

Good to know

Will it change my code without asking?

Wave 1 and 2 are strictly review-only. Only Step 4 edits, and only the unambiguous quick wins — anything debatable becomes a decision you approve first. In PKA those edits go through dev-loop.

How many agents is that, and is it expensive?

Five for a plain project, six if it shows numbers, plus a small Wave 2. The example run was 8 agents over 2 waves. They run in parallel, so it's fast; cost scales with project size, not wall-clock.

What about a website instead of a dashboard?

The personas adapt to the audience — e.g. first-time visitor, prospective customer, mobile user, conversion skeptic — and the accuracy verifier is dropped when there are no numbers.

What was fixed before install?

The original ZIP shipped a SKILL.md with no YAML frontmatter, so Claude Code couldn't register it — no /tester-panel command and no auto-trigger. A proper name + trigger-rich description block was added, plus a PKA note routing Step 4 through dev-loop. It's now installed and live.