# visual-qa

> Use when making UI/styling changes and need to verify no visual regressions - provides screenshot capture, comparison, vision model scoring, and video recording for animations

- Author: Christopher C. Smith
- Repository: chriscarrollsmith/chriscarrollsmith.github.io
- Version: 20260119183113
- Stars: 4
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/chriscarrollsmith/chriscarrollsmith.github.io
- Web: https://mule.run/skillshub/@@chriscarrollsmith/chriscarrollsmith.github.io~visual-qa:20260119183113

---

---
name: visual-qa
description: Use when making UI/styling changes and need to verify no visual regressions - provides screenshot capture, comparison, vision model scoring, and video recording for animations
---

# Visual QA

## Overview

Capture screenshots before/after changes, compare pixel diffs, score with vision models. Use video capture for animations.

## Quick Reference

| Task | Command |
|------|---------|
| **Capture baseline** | `bun visual_qa/capture_anchors.mjs --base-url http://localhost:4321 --start-dev --label baseline "#about" "/cv"` |
| **Capture candidate** | Same command with `--label candidate` (auto-compares to baseline) |
| **Add CSS labels** | Add `--annotate` flag to overlay selector labels |
| **Score screenshot** | `llm -m openrouter/qwen/qwen2.5-vl-32b-instruct -a image.png "$(cat visual_qa/standard_rubric.md) Score this."` |
| **Compare two images** | `llm -m ... -a baseline.png -a candidate.png "Compare. Flag regressions."` |
| **Record video** | See Animation Workflow below |

## Core Workflow

1. **Before changes**: Capture baseline screenshots
2. **Make changes**: Edit CSS/components
3. **After changes**: Capture candidate (auto-compares)
4. **Review diffs**: Only images marked as changed need review
5. **Score if needed**: Use vision model to score or compare

## Animation Workflow

**Use video capture when:** Testing animations, transitions, or interactive UI state changes.

**Do NOT use for routine testing** - video burns tokens. Screenshots suffice for static content.

```javascript
// In Playwright test or script:
const context = await browser.newContext({
  recordVideo: { dir: 'visual_qa/videos/' }
});
const page = await context.newPage();
await page.goto('http://localhost:4321/#about');
await page.waitForTimeout(5000); // Capture full animation cycle
await context.close(); // Video saved on close
```

**Analyze with vision model (use Gemini for video):**
```bash
llm -m gemini/gemini-flash-latest \
    -a visual_qa/videos/recording.webm \
    "This is a 5-second recording of a fade animation. Is the timing smooth? Any visual glitches?"
```

## Extending the Rubric

Add custom metrics (max score increases by 2 per metric):

```bash
llm -m openrouter/qwen/qwen2.5-vl-32b-instruct \
    -a image.png \
    "$(cat visual_qa/standard_rubric.md)

6. Animation smoothness
   - 0: Janky, stuttering, or inconsistent timing
   - 1: Generally smooth with minor hesitations
   - 2: Buttery smooth, professional feel

Score all 6 metrics (0-12 max)."
```

## Interpreting Scores

- **Score drop with intentional content change**: Not a regression if layout remains clean
- **Score drop without content change**: Investigate specific metrics that dropped
- **Use comparison prompt** to identify specific regressions rather than just scores

## Files

- `visual_qa/capture_anchors.mjs` - Screenshot capture at all breakpoints
- `visual_qa/compare_images.mjs` - Pixel diff comparison
- `visual_qa/standard_rubric.md` - 5-metric scoring rubric (0-10)