# smart-screenshot-qa

> Efficient frontend QA using the right verification method. Use when doing browser-based QA, taking screenshots, verifying UI changes, or when screenshot loops are wasting tokens. Optimizes between DOM inspection, targeted zoom, and full screenshots.

- Author: Thomas Hoffman
- Repository: tghoffdev/smart-screenshot-qa-skill
- Version: 20260202171553
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/tghoffdev/smart-screenshot-qa-skill
- Web: https://mule.run/skillshub/@@tghoffdev/smart-screenshot-qa-skill~smart-screenshot-qa:20260202171553

---

---
name: smart-screenshot-qa
description: Efficient frontend QA using the right verification method. Use when doing browser-based QA, taking screenshots, verifying UI changes, or when screenshot loops are wasting tokens. Optimizes between DOM inspection, targeted zoom, and full screenshots.
---

# Smart Screenshot QA

Stop the screenshot spirals. Choose the right verification method for the job.

## When NOT to Use This Skill

This skill optimizes for verify-and-move-on. Skip it when:
- Visual regression testing requires systematic before/after comparisons
- Accessibility audits need comprehensive coverage
- QA specs explicitly require thoroughness over efficiency

## Token Costs (Tested on GitHub.com)

| Method | Tokens | Best For |
|--------|--------|----------|
| Targeted zoom (region) | 100-200 | Component styling, specific elements |
| `find` (natural language) | 500-1,000 | "Is there a submit button?" without full DOM |
| `read_page` (filter: "interactive") | 800-2,000 | Structural checks, element existence |
| Full viewport screenshot | 1,000-1,500 | Layout verification, final sign-off |
| `read_page` (full DOM) | 6,000-25,000+ | Avoid on complex pages |

**Key insight:** Filtered DOM and full screenshots are comparable in cost. The real wins: targeted zoom is 5-10x cheaper than full screenshots, and avoiding full DOM dumps saves 15-20x. DOM is faster for structural questions, not necessarily cheaper.

## Decision Tree

**"Is there a submit button somewhere?" (don't know the selector)**
→ `find` with natural language. Mid-cost, no selector needed.

**"Does element X exist / have correct text / attributes?" (know what you're looking for)**
→ `read_page` with `filter: "interactive"`. Faster than screenshots for structural checks.

**"Does this button/component look right?"**
→ `zoom` on that specific region. 5-10x cheaper than full screenshot.

**"Is the layout/spacing/alignment correct?"**
→ Single full screenshot, but only after batching changes.

**"Final check before shipping"**
→ One full screenshot. Done. Move on.

## Anti-Patterns (The Real Token Killers)

1. **Screenshot loops** - Taking the same screenshot over and over "to make sure." One verification that passes = done.
2. **Full screenshot for one component** - Use targeted zoom. 5-10x cheaper.
3. **Screenshot after every change** - Batch 3-5 styling/content changes, then one screenshot. For complex layout changes (flexbox, z-index, grid), verify sooner to isolate regressions.
4. **Retaking screenshots you already have** - Reference existing imageId if nothing visual changed.
5. **Full DOM dump on complex pages** - Always use `filter: "interactive"` (60-80% savings).
6. **Full-page scroll screenshots** - Use zoom for specific sections.
7. **Retrying failed zooms** - If `zoom` returns empty or ambiguous, fall back to full screenshot immediately. Don't retry.

## Why Loops Are The Real Problem

Individual screenshots aren't that expensive (700-1,500 tokens). The problem is spirals:

```
change → screenshot → tweak → screenshot → "let me check" → screenshot → "one more" → screenshot
```

5 unnecessary screenshots = 3,500-7,500 wasted tokens. Per component. Per session.

## Before/After Comparisons

You cannot retroactively view old imageIds. They persist for the browser session but disappear from context when it gets summarized.

Before taking a "before" screenshot:
- Note key visual details in text (spacing, colors, positions)
- Make changes
- Take "after" screenshot and compare against your notes

If you need to reference a screenshot later in a long session, note the key details in text immediately after taking it.

## Exit Criteria (When QA is Done)

Stop when:
- The specific visual or structural property requested has been verified
- No relevant console errors (`read_console_messages`)
- Interactive elements respond correctly (if applicable)

Do NOT keep screenshotting to make it "perfect" or to "double-check."

## Quick Reference

```
"Is there a [thing]?"  → find (natural language, mid-cost)
Element exists?        → read_page filter:"interactive" (fastest for structural)
Text/attribute check   → read_page filter:"interactive"
Component styling      → zoom (5-10x cheaper than full screenshot)
Layout verification    → single screenshot after batching
Final sign-off         → one screenshot, then stop
Full DOM               → avoid (can cost 15-20x more than screenshot)
```