Skip to content

TechGardenCode/naive-user

naive-user

Drive your live web app as a source-blind, first-time user. An AI agent hovers, clicks, and types in a real browser, watches what actually happens, and reports the gaps (bugs, broken expectations, UX surprises, accessibility issues) before real users hit them.

License: MIT Harnesses Powered by Playwright MCP PRs welcome

It covers a gap that three other things miss:

  • Real users find these problems, but only after you ship.
  • You, testing manually, know how the app is built, so you cannot see it fresh.
  • Scripted end-to-end tests assert flows you already know. They cannot discover the ones you do not.

The agent forms expectations from only two sources: what is on the screen, and universal web conventions. It never reads your source code. It builds a knowledge base under qa/naive-user/<app>/ that lives outside the code, the way a real user's mental model does, and it compounds across runs.

Contents

What you get

qa/naive-user/<app>/
├── mental-model.md        # what a naive user believes the app does; refined every run
├── findings/<date>.md     # dated gap reports (Expected / Did / Observed / Gap / Severity / Repro)
└── screenshots/           # before/after evidence (gitignored)

Findings are severity-ranked: bug, broken-expectation, ux-gap, surprise, a11y.

Requirements

  • Node.js 18+ is the only hard requirement. The Playwright MCP server provisions its browser on first use. If you ever hit a missing-browser error, run npx playwright install chromium.
  • A running app, either already serving at a URL or startable from a startCommand in your config.
  • One of the supported harnesses below. Each needs agentic tool-calling plus the Playwright MCP browser tools, which is why instruction-only IDE rule hosts are not targeted.

Install

1. Add the plugin to your harness and wire Playwright MCP

The same MCP server body, npx @playwright/mcp@latest, works everywhere. Only OpenCode and Copilot tweak the shape.

Claude Code. Marketplace install. The plugin bundles the Playwright MCP, so it is one step:

/plugin marketplace add TechGardenCode/naive-user      # or a local path: ./naive-user
/plugin install naive-user@naive-user

Not using the plugin? Copy this repo's .mcp.json into your app repo root and drop skills/ and commands/ into your project's .claude/.

Codex. Drop the plugin in, then add the MCP server to ~/.codex/config.toml:

[mcp_servers.playwright]
command = "npx"
args = ["@playwright/mcp@latest"]

Gemini CLI. Install as an extension (gemini-extension.json bundles both the MCP server and the skill as context), or add the server to ~/.gemini/settings.json:

{
  "mcpServers": {
    "playwright": { "command": "npx", "args": ["@playwright/mcp@latest"] }
  }
}

GitHub Copilot CLI. The plugin lives at .github/plugin/plugin.json and bundles the MCP. To wire it manually, add to ~/.copilot/mcp-config.json:

{
  "mcpServers": {
    "playwright": { "type": "local", "command": "npx", "args": ["@playwright/mcp@latest"], "tools": ["*"] }
  }
}

OpenCode. opencode.json bundles the MCP. Note that command is an array, and -y avoids the interactive npx prompt:

{
  "mcp": {
    "playwright": { "type": "local", "command": ["npx", "-y", "@playwright/mcp@latest"], "enabled": true }
  }
}

Configure

1. Point it at your app

Copy templates/naive-user.config.json into your app repo's root and fill it in:

{
  "app": "myapp",
  "baseUrl": "http://localhost:3000",
  "startCommand": null,
  "auth": { "steps": ["Go to /", "Type the dev username", "Submit"], "critiqueLoginPage": false },
  "commitFindings": "ask",
  "coverageNotes": "Optional hints, not a script."
}

startCommand replaces any "how do I start the app" step. Set it and the agent runs it. Leave it null and the agent assumes the app is already up at baseUrl. See examples/notes-app/ for a fully worked config.

2. Pick a device, or go headless

browser.device is optional. Add "browser": { "device": "mobile" } (or tablet / desktop) to your config and the agent issues one browser_resize to that viewport after the first navigation; leave it out and the run uses the browser as launched.

device viewport
desktop 1280 × 800
tablet 820 × 1180
mobile 390 × 844

This emulates viewport size only — enough to exercise CSS-responsive layouts and mobile nav. It does not set a mobile user-agent, touch, or device pixel ratio. For apps that gate on those, emulate a real device at the server level with --device "iPhone 14" in your Playwright MCP args (that sets a real viewport, user-agent, touch, and pixel ratio), and leave browser.device unset so the run does not resize over it.

Headless is a Playwright MCP server setting, not a config key. The browser runs headed by default (handy for watching it drive); for unattended, subagent, or CI runs, launch the server with --headless. Add it to the args array wherever your harness declares the server — the How it ships table maps each location — e.g. "args": ["@playwright/mcp@latest", "--headless"]. OpenCode has no args key; append the flag to its command array instead: ["npx", "-y", "@playwright/mcp@latest", "--headless"].

The Claude Code plugin bundles the server headed. To run it headless, declare your own playwright server (the non-plugin setup above) in your project's .mcp.json with --headless in args; a project-scope definition takes precedence over the plugin's bundled one.

3. Keep the knowledge base reviewable

The mental model and findings only compound if they land in git, so at the end of a run the agent reconciles what it wrote under qa/naive-user/<app>/. commitFindings controls how (default "ask"):

  • "ask": show you the changed files and ask before committing them.
  • "auto": commit without asking (for unattended or subagent runs that cannot be prompted).
  • "off": never touch git; leave the changes in your working tree.

It stages only qa/naive-user/<app>/, never your other changes. In your app repo's .gitignore, commit the markdown but ignore the screenshot evidence:

qa/naive-user/*/screenshots/

Usage

/naive-test [app]

With no argument it uses the app from naive-user.config.json. The agent loads the prior mental model, makes sure the app is running, signs in, explores the live UI source-blind, writes an updated mental-model.md plus a dated findings report, and closes the browser when done. Run it on demand while developing, or dispatch it as a subagent to run in parallel with other work.

A changed-from-last-time behavior is flagged as a regression.

Example output

A findings file leads with a one-line summary and a severity-sorted table, then one entry per gap:

| # | Severity | Surface | Gap |
|---|----------|---------|-----|
| 1 | bug      | Capture | Pressing Enter in the title field reloads the page |
| 2 | a11y     | Sidebar | Active nav item has no visible focus ring |

## 1. Pressing Enter reloads instead of saving
- Expected: Enter submits the form (primary-button convention).
- Did: Typed a title, pressed Enter.
- Observed: Full page reload, draft lost.
- Severity: bug
- Repro: 1. Open /. 2. Type in title. 3. Press Enter.
- Screenshot: screenshots/capture-enter-before.png

How it ships

Core content lives once:

  • skills/naive-user/SKILL.md holds the source-blind testing methodology (config-driven).
  • commands/naive-test.md (plus .toml for Codex and OpenCode) is the /naive-test entry point.

Each harness gets a thin manifest that points at those files and declares the Playwright MCP in that harness's native format. No content is duplicated:

Harness Manifest MCP declared in
Claude Code .claude-plugin/plugin.json (plus marketplace.json) plugin mcpServers / .mcp.json
Codex .codex-plugin/plugin.json ~/.codex/config.toml
Gemini CLI gemini-extension.json extension mcpServers / settings.json
Copilot CLI .github/plugin/plugin.json plugin mcpServers / ~/.copilot/mcp-config.json
OpenCode opencode.json opencode.json mcp

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for the workflow, and SECURITY.md to report a vulnerability privately.

License

MIT. See LICENSE.

Releases

No releases published

Packages

 
 
 

Contributors