# macos-qa-loop

> Autonomous QA verification loop for macOS applications. Builds the app, screenshots every screen, compares against design mocks using vision, verifies spec compliance, and loops until the app matches or reports remaining gaps. Use when validating a macOS app against its specification and design mocks before release. Triggers on "verify app against mocks", "QA loop", "check app matches spec", "validate macOS app".

- Author: eovidiu
- Repository: eovidiu/agents-skills
- Version: 20260204225927
- Stars: 2
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/eovidiu/agents-skills
- Web: https://mule.run/skillshub/@@eovidiu/agents-skills~macos-qa-loop:20260204225927

---

---
name: macos-qa-loop
author: Ovidiu Eftimie
description: Autonomous QA verification loop for macOS applications. Builds the app, screenshots every screen, compares against design mocks using vision, verifies spec compliance, and loops until the app matches or reports remaining gaps. Use when validating a macOS app against its specification and design mocks before release. Triggers on "verify app against mocks", "QA loop", "check app matches spec", "validate macOS app".
---

# macOS QA Verification Loop

## Overview

This skill runs an autonomous quality assurance loop that verifies a macOS application matches its specification and design mocks. It builds the app, captures screenshots of every relevant screen, compares them against design mocks using Claude's vision capabilities, checks spec requirements, and produces a structured gap report. The loop repeats after fixes until the app is fully compliant — or declares exactly what remains.

This is the bridge between "tests pass" and "it's actually correct."

## When to Use This Skill

Use this skill when:
- A macOS app implementation is complete (or near-complete) and needs verification against spec
- Design mocks exist (screenshots, Figma exports, or reference images) and you need to confirm the built app matches them
- You want automated visual regression checking during development
- You need a structured gap report showing what matches spec and what doesn't
- Final pre-release QA before handing off to `macos-cicd-distributor`

**Do NOT use this skill when:**
- You're still writing initial code (use `macos-tdd-expert` for TDD during development)
- No spec or mocks exist (nothing to verify against)
- The app doesn't build yet (fix build errors first)

**Trigger phrases:**
- "Verify this app against the mocks"
- "Run QA loop on the macOS app"
- "Check if the app matches the spec"
- "Compare the app against design mocks"
- "Is the app done per spec?"

## Prerequisites

Before running the QA loop, ensure:

1. **Spec exists** — A specification document describing the app's requirements (markdown, structured text, or `project-curator` format)
2. **Mocks exist** — Design reference images for each screen/state (PNG, JPEG, or PDF)
3. **App builds** — `xcodebuild` succeeds without errors
4. **Xcode project path** — Know the `.xcodeproj` or `.xcworkspace` location
5. **Scheme name** — The Xcode scheme to build and run

## The QA Loop

### Phase 1: Intake — Parse Spec and Mocks

Read the specification and produce a **verification checklist**: a structured list of every testable requirement.

```markdown
## Verification Checklist

| ID   | Requirement                          | Type     | Screen    | Status  |
|------|--------------------------------------|----------|-----------|---------|
| R001 | Sidebar shows 5 navigation items     | Visual   | Main      | pending |
| R002 | Clicking "Settings" opens prefs pane | Behavior | Main      | pending |
| R003 | Dark mode inverts all backgrounds    | Visual   | Main+Prefs| pending |
| R004 | Export button is disabled when empty  | State    | Main      | pending |
```

**Types:**
- **Visual** — Verified by comparing screenshot against mock
- **Behavior** — Verified by interacting with the app (XCUITest or accessibility API)
- **State** — Verified by checking specific UI states (enabled/disabled, visible/hidden)
- **Data** — Verified by checking displayed data matches expected values

**Reference**: `references/spec-checklist-format.md` — Detailed guide on parsing specs into verification checklists.

### Phase 2: Build

Build the application using `xcodebuild`:

```bash
xcodebuild -project "$PROJECT_PATH" \
  -scheme "$SCHEME" \
  -configuration Debug \
  -derivedDataPath "$DERIVED_DATA" \
  build 2>&1
```

If build fails, STOP. Report the build error. Do not proceed to screenshots.

**Build output location:**
```bash
APP_PATH="$DERIVED_DATA/Build/Products/Debug/$APP_NAME.app"
```

### Phase 3: Launch and Screenshot

Launch the app and capture screenshots of every screen/state defined in the mocks:

```bash
# Launch the app
open "$APP_PATH"
sleep 2  # Allow launch animation to complete

# Capture the main window
screencapture -l $(osascript -e "tell app \"$APP_NAME\" to id of window 1") \
  "$OUTPUT_DIR/screen-main.png"
```

For multi-screen apps, navigate to each screen and capture:

```bash
# Use osascript to navigate via accessibility
osascript -e '
  tell application "System Events"
    tell process "AppName"
      click menu item "Preferences..." of menu "AppName" of menu bar 1
    end tell
  end tell
'
sleep 1
screencapture -l $(osascript -e "tell app \"$APP_NAME\" to id of window 1") \
  "$OUTPUT_DIR/screen-preferences.png"
```

**Reference**: `references/verification-workflow.md` — Complete screenshot capture workflow including window targeting, state navigation, and multi-monitor handling.

**Script**: `scripts/screenshot-app.sh` — Ready-to-use build + launch + screenshot utility.

### Phase 4: Visual Comparison

Compare each captured screenshot against its corresponding design mock using Claude's vision.

**How this works:**
1. Read the design mock image (the reference)
2. Read the captured screenshot (the actual)
3. Ask Claude to compare them semantically

**Comparison dimensions:**
- **Layout** — Element positioning, spacing, alignment
- **Typography** — Font sizes, weights, line heights (approximate)
- **Colors** — Background, text, accent colors match
- **Components** — All expected UI elements are present
- **States** — Correct enabled/disabled, selected/unselected states
- **Content** — Placeholder text, icons, labels match expectations

**Comparison output format:**
```markdown
### Screen: Main Window

**Mock**: mocks/main-window.png
**Actual**: screenshots/screen-main.png

| Aspect     | Match | Issue                                         |
|------------|-------|-----------------------------------------------|
| Layout     | ⚠️    | Sidebar is 20px wider than mock               |
| Typography | ✅    | —                                             |
| Colors     | ✅    | —                                             |
| Components | ❌    | Missing "Export" button in toolbar             |
| States     | ✅    | —                                             |
| Content    | ⚠️    | Placeholder text still shows "Lorem ipsum"    |
```

**Reference**: `references/visual-comparison-guide.md` — Detailed comparison methodology, tolerance thresholds, and edge cases.

### Phase 5: Behavioral Verification

For requirements typed as **Behavior** or **State**, verify by interacting with the running app:

**Option A: XCUITest (preferred if tests exist)**
```bash
xcodebuild test \
  -project "$PROJECT_PATH" \
  -scheme "${SCHEME}UITests" \
  -configuration Debug \
  -derivedDataPath "$DERIVED_DATA" \
  2>&1
```

**Option B: Accessibility API via osascript**
```applescript
tell application "System Events"
  tell process "AppName"
    -- Check button exists and is enabled
    set exportBtn to button "Export" of toolbar 1 of window 1
    return {exists:exists of exportBtn, enabled:enabled of exportBtn}
  end tell
end tell
```

**Option C: Manual verification prompt**
When automation cannot verify a requirement, produce a clear manual verification step:
```markdown
**Manual Check Required:**
- [ ] R007: "Drag and drop reorders sidebar items" — Launch app, drag item 3 above item 1, verify order persists after restart
```

### Phase 6: Gap Report

Produce a structured report combining all verification results:

```markdown
# QA Verification Report

**App**: MyApp v1.0
**Date**: 2025-02-01
**Spec**: docs/spec.md
**Mocks**: mocks/

## Summary

| Status | Count |
|--------|-------|
| ✅ Pass   | 12    |
| ⚠️ Partial | 3     |
| ❌ Fail   | 2     |
| ⏭️ Skipped | 1     |
| **Total** | **18** |

**Verdict**: NOT READY — 2 failures, 3 partial matches

## Failures

### R005: Export button in toolbar
- **Expected**: Toolbar contains "Export" button with document.arrow.up icon
- **Actual**: Button is missing from toolbar
- **Fix**: Add NSToolbarItem with identifier "export" to toolbar configuration

### R011: Dark mode background colors
- **Expected**: All backgrounds invert to #1E1E1E in dark mode
- **Actual**: Sidebar background stays white (#FFFFFF)
- **Fix**: Sidebar background color not using .windowBackgroundColor semantic color

## Partial Matches

### R001: Sidebar width
- **Expected**: 240px fixed width
- **Actual**: ~260px (approximately 20px wider)
- **Fix**: Check sidebar width constraint, should be 240

## Passes
[List of all passing requirements with ✅]

## Manual Checks Required
- [ ] R007: Drag and drop reorder (cannot automate)
```

**Template**: `assets/templates/qa-report.md` — Copy-ready gap report template.

### Phase 7: Loop Decision

After producing the gap report:

**If all pass (✅ only):**
```
✅ QA COMPLETE — App matches spec and mocks.
Ready for macos-cicd-distributor.
```

**If failures exist (❌ or ⚠️):**
```
Report the gap report to the coding agent / orchestrator.
Wait for fixes.
Re-run from Phase 2 (build).
```

**Loop limits:**
- Maximum 5 iterations per QA session
- If iteration 5 still has failures, STOP and escalate to Ovidiu
- Each iteration should fix at least 1 issue — if no progress after 2 iterations, STOP

## Verification Dimensions

### Visual Fidelity (What You See)

Compares the rendered app against design mocks across these axes:

| Axis | What to Check | Tolerance |
|------|--------------|-----------|
| Layout | Element positions, spacing, alignment | ±5px |
| Sizing | Component widths, heights | ±5px |
| Colors | Backgrounds, text, accents | Semantic match (not pixel-exact) |
| Typography | Font size, weight, style | Approximate match |
| Icons | Correct icon, correct size | Present/absent + approximate |
| Shadows/Effects | Drop shadows, blur, vibrancy | Present/absent |

**Tolerance philosophy:** This is semantic comparison, not pixel diffing. "The sidebar is roughly the right width and has the right items" matters more than "the sidebar is exactly 240.0px." Flag significant deviations, ignore rendering engine differences.

### Spec Compliance (What It Does)

Every requirement in the spec maps to a verification:

| Requirement Type | Verification Method |
|-----------------|-------------------|
| "User can X" | Behavioral test (XCUITest or accessibility) |
| "Screen shows X" | Visual comparison |
| "X is disabled when Y" | State check via accessibility API |
| "Data persists after Z" | Behavioral test with app restart |
| "Performance: X under Y ms" | Instrumented timing |

### Behavioral Correctness (How It Works)

For interactive requirements, verify the app responds correctly:
- Navigation between screens
- Button actions produce expected results
- Menu items trigger correct behavior
- Keyboard shortcuts work
- Window resize behavior matches expectations

## Integration with Other Skills

```
┌─────────────────────────────────────────────────────────┐
│                    Development Flow                      │
│                                                          │
│  macos-senior-engineer  →  Writes the code               │
│  macos-tdd-expert       →  TDD during development        │
│  macos-qa-loop          →  Verifies against spec & mocks │
│  macos-cicd-distributor →  Signs and ships                │
│                                                          │
│  project-curator        →  Provides the spec             │
│  macos-senior-ux        →  Provides the design rationale │
│  macos-app-architect    →  Provides architecture context │
└─────────────────────────────────────────────────────────┘
```

**Handoff from `macos-tdd-expert`:** When unit/integration tests pass, invoke `macos-qa-loop` to verify the built app visually and behaviorally.

**Handoff to `macos-cicd-distributor`:** When QA loop reports all-pass, the app is ready for signing, notarization, and distribution.

## Workflow Quick Start

### Step 1: Gather Inputs

```bash
# Identify your inputs
PROJECT_PATH="./MyApp.xcodeproj"  # or .xcworkspace
SCHEME="MyApp"
SPEC_PATH="./docs/spec.md"
MOCKS_DIR="./mocks/"
OUTPUT_DIR="./qa-output/"

mkdir -p "$OUTPUT_DIR"
```

### Step 2: Run the Loop

Invoke the skill:
```
Verify MyApp against spec at docs/spec.md and mocks in mocks/ directory.
Project: MyApp.xcodeproj, Scheme: MyApp
```

The skill will:
1. Parse the spec into a verification checklist
2. Build the app
3. Screenshot every screen referenced in mocks
4. Compare screenshots against mocks
5. Run behavioral checks
6. Produce a gap report
7. Loop if needed

### Step 3: Review Report

The gap report lands in `$OUTPUT_DIR/qa-report.md`. Review and either:
- Accept (all pass) → proceed to distribution
- Fix (failures exist) → re-run after fixes

## Configuration

### Required Inputs

| Input | Description | Example |
|-------|------------|---------|
| `PROJECT_PATH` | Path to .xcodeproj or .xcworkspace | `./MyApp.xcodeproj` |
| `SCHEME` | Xcode build scheme | `MyApp` |
| `SPEC_PATH` | Path to specification document | `./docs/spec.md` |
| `MOCKS_DIR` | Directory containing design mock images | `./mocks/` |

### Optional Inputs

| Input | Description | Default |
|-------|------------|---------|
| `OUTPUT_DIR` | Where to write reports and screenshots | `./qa-output/` |
| `MAX_ITERATIONS` | Maximum QA loop iterations | `5` |
| `CONFIGURATION` | Xcode build configuration | `Debug` |
| `UI_TEST_SCHEME` | Scheme for XCUITest suite | `${SCHEME}UITests` |
| `APPEARANCE` | Light, Dark, or Both | `Both` |

### Mock File Naming Convention

Mocks should be named to match screens:

```
mocks/
├── main-window.png           # Main app window
├── main-window-dark.png      # Main window in dark mode
├── preferences.png           # Preferences pane
├── preferences-dark.png      # Preferences in dark mode
├── empty-state.png           # Main window with no data
└── error-dialog.png          # Error alert
```

The skill matches mock filenames to screen identifiers in the verification checklist.

## Error Handling

| Error | Response |
|-------|----------|
| Build fails | STOP. Report build error. Do not screenshot. |
| App crashes on launch | STOP. Report crash log. |
| Screenshot capture fails | Retry once. If still fails, report and skip visual check for that screen. |
| Mock image missing | Skip visual comparison for that screen. Note in report. |
| Accessibility API blocked | Report permission requirement. Suggest granting in System Settings > Privacy > Accessibility. |
| No progress after 2 iterations | STOP. Escalate to Ovidiu with full gap history. |
| Max iterations (5) reached | STOP. Produce final report with remaining gaps. |

## Resources

**References:**
- `references/verification-workflow.md` — Complete loop mechanics, phase transitions, and state management
- `references/visual-comparison-guide.md` — How to compare screenshots vs mocks using vision, tolerance thresholds, edge cases
- `references/spec-checklist-format.md` — How to parse specs into testable verification checklists

**Scripts:**
- `scripts/screenshot-app.sh` — Build + launch + screenshot utility for macOS apps

**Templates:**
- `assets/templates/qa-report.md` — Gap report template
- `assets/templates/spec-checklist.md` — Specification verification checklist template