Point it at any dashboard, app or website and it spins up a panel of five (or six) tester-user agents — each wearing a different hat — runs them in parallel, then folds everything into one prioritised roadmap and ships the obvious quick wins. This is the visual walkthrough: what it does, how to run it, and exactly what to expect back.
Think of it as hiring a small focus group of expert testers for an hour — except they all work at once, finish in minutes, and hand you a single agreed list instead of five conflicting opinions. Each tester only looks; none of them touch the code. After they report, you (with their findings) decide what to fix, and the unambiguous quick wins get shipped on the spot.
It finds problems. Confusing screens, missing labels, jargon, dead buttons, bloat, wrong numbers.
A gate keeps them out. Tester Panel is the broad discovery sweep; your accuracy/regression tests are the narrow guard. Different jobs — run both.
0
Core tester lenses, run in parallel
0
Lenses if the project shows numbers (+verifier)
0
Waves: panel → verify & produce
1
Prioritised roadmap you actually act on
How to run it
Three ways in
All three land in the same place. The skill auto-triggers on phrases like “test the project”, “run the tester panel”, “is this user-friendly”, “find improvements”, or “audit the dashboard”. You don't have to remember the exact command.
The workflow at a glance
Panel → verify → consolidate → ship
Both waves fan out in parallel — five-plus agents work simultaneously, so the whole sweep costs minutes, not a serial slog. Everything funnels into one consolidated list at the end.
The panel — six hats, one tap to explore
Click a card
🧊
① Cold new user
Knows nothing. Is it self-explanatory?
🎯
② Decision-maker
The senior it's built for. Is it useful?
⚡
③ Power user
Daily-use friendliness & control.
🔍
④ UX sweep
Every screen, every state.
🧱
⑤ IA skeptic
Bloat, redundancy, what to cut.
🧮
⑥ Accuracy verifier
Only if it shows numbers.
What you actually get back
Outputs
1 · A prioritised punch list per tester
Each agent returns a tight list in one fixed shape, under 500 words, so they stack cleanly:
[HIGH|MED|LOW] — area — finding — concrete fix
Plus a per-screen verdict: “would anyone use this, and what decision does it drive?”
2 · One consolidated roadmap
Everything de-duplicated and split into two buckets:
SHIP NOW unambiguous, low-risk quick wins
DECIDE naming, restructure, big features — your call
Anything that clashes with a past instruction is surfaced as a question, never silently overridden.
Example consolidated output
Pri
Area
Finding
Fix
HIGH
Search
Search box renders but does nothing
Wire the input handler / remove it
HIGH
Orientation
No per-tab description; new users lost
Add a one-line banner per tab
MED
IA
22 tabs, much overlap
Consolidate to ~12 (decision)
MED
Scroll
Header + first column scroll away
Make them sticky
LOW
Numbers
“Anomaly” was an incomplete period
Add an incomplete-period flag
Then it ships the quick wins. Unambiguous high-value, low-risk fixes (descriptions, sticky headers, confirmed-bug fixes, search) get implemented and verified in a real browser at full viewport — and if you have a regression gate, it runs before reporting done.
Step by step, what happens
Expand each
Step 0 — Establish context
States what the thing is, who uses it, where the source & built files are, how to view it live (path + URL/port), and how agents should inspect it — read source and rendered output, in a real browser at 1440×900. If there's a known set of screens, it lists them so every tester starts from the same map.
Step 1 — Wave 1: the panel (parallel)
Five review-only agents launch at once, each with a self-contained brief and its own lens. None edit code. Each returns the [HIGH|MED|LOW] — area — finding — fix list plus per-screen verdicts. A sixth accuracy verifier joins if the project shows numbers.
Step 2 — Wave 2: verify & produce (parallel)
After consolidating Wave 1, focused agents (a) definitively verify any flagged accuracy/functional bug — real numbers, root cause, file:line, and a verdict (real bug vs data artefact vs misread) — and (b) produce ready-to-paste deliverables: exact per-screen copy, glossary/tooltip text, a proposed leaner structure.
Step 3 — Consolidate
You get one prioritised roadmap: quick wins shippable now vs decisions needed. Conflicts with prior instructions are raised as explicit choices, with concrete options.
Step 4 — Implement the quick wins
The unambiguous, low-risk wins get built and verified in a real browser; the regression gate (if any) runs before “done”. In PKA this implementation pass routes through dev-loop, per the house rule that all code changes go through it — the audit waves stay direct.
When to reach for it
Fit / not-fit
✓ Great fit
You just built or reshaped a dashboard / report / app / site.
“Is this actually usable? What am I missing?”
It feels bloated and you want an outside eye on structure.
Numbers are shown and you want a forensic tie-out.
Before showing something to a senior stakeholder.
✗ Reach for something else
A specific known bug → just fix it (or dev-loop).
Automated accuracy/regression guarding → that's a gate, not this discovery sweep.
Pure visual QA of one page → the tessa agent.
Building the thing in the first place → design/build skills first, panel after.
Notes & FAQ
Good to know
Will it change my code without asking?
Wave 1 and 2 are strictly review-only. Only Step 4 edits, and only the unambiguous quick wins — anything debatable becomes a decision you approve first. In PKA those edits go through dev-loop.
How many agents is that, and is it expensive?
Five for a plain project, six if it shows numbers, plus a small Wave 2. The example run was 8 agents over 2 waves. They run in parallel, so it's fast; cost scales with project size, not wall-clock.
What about a website instead of a dashboard?
The personas adapt to the audience — e.g. first-time visitor, prospective customer, mobile user, conversion skeptic — and the accuracy verifier is dropped when there are no numbers.
What was fixed before install?
The original ZIP shipped a SKILL.md with no YAML frontmatter, so Claude Code couldn't register it — no /tester-panel command and no auto-trigger. A proper name + trigger-rich description block was added, plus a PKA note routing Step 4 through dev-loop. It's now installed and live.