# skill-review-audit

> Use when a user asks to review, interpret, or audit an AI agent skill (SKILL.md plus bundled scripts/references/assets) for capabilities, triggering behavior, tool/command usage, safety & privacy risk, supply-chain provenance, quality gaps, and improvement recommendations; also use when validating a skill before installing or deploying it.

- Author: okwinds
- Repository: okwinds/miscellany
- Version: 20260201011217
- Stars: 18
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/okwinds/miscellany
- Web: https://mule.run/skillshub/@@okwinds/miscellany~skill-review-audit:20260201011217

---

---
name: skill-review-audit
description: Use when a user asks to review, interpret, or audit an AI agent skill (SKILL.md plus bundled scripts/references/assets) for capabilities, triggering behavior, tool/command usage, safety & privacy risk, supply-chain provenance, quality gaps, and improvement recommendations; also use when validating a skill before installing or deploying it.
---

# Skill Review & Audit

Produce a **systematic, multi-dimensional review** of any skill directory (a `SKILL.md` plus optional `scripts/`, `references/`, `assets/`, and install metadata).

## Outcomes

- A clear description of what the skill teaches and *what it does not*.
- A map of **tooling + side effects** the skill may cause when followed (commands, network, file writes, permissions).
- A **risk assessment** (security, privacy, safety, supply chain) with mitigations.
- A **quality assessment** (correctness, completeness, maintainability, UX) with prioritized improvements.
- Optional scoring using `references/scoring-rubric.md`.
- A report formatted using `references/report-template.md`.

## Inputs To Request (If Missing)

- Skill identifier: name and/or filesystem path to the skill directory.
- Target agent environment (e.g. Codex CLI / Claude Code / other) and any constraints (offline, no web, sandboxed, etc.).
- Intended usage context (what kinds of user prompts should trigger it; what “done” looks like).

## Workflow (Do In Order)

### 0) Scope The Review

- Confirm whether the review is **(a)** informational only or **(b)** includes proposing patches to the skill.
- Define what “safe enough” means for the target environment (network allowed? can write files? secrets present?).

### 1) Inventory & Provenance

1. List the full directory tree and file sizes.
2. Identify install/provenance files (common: `.openskills.json`, `package.json`, `pyproject.toml`, git submodule markers).
3. Record:
   - Skill root path
   - Total file count
   - Presence of `scripts/`, `references/`, `assets/`
   - Any external source URL + install timestamp (if present)
4. Flag anything unexpected (executables, binaries, obfuscated blobs, huge files, symlinks pointing elsewhere).

Optional helper: run `scripts/scan_skill.sh` (read it first; it is intended to be read-only).
Note: `scan_skill.sh` may surface sensitive strings (e.g., tokens, private keys) depending on the target directory. Treat its output as sensitive; redact before sharing.

### 2) Trigger Contract (Frontmatter Audit)

Read `SKILL.md` YAML frontmatter and assess:

- **Name**: unique, stable, correctly scoped (not overly broad).
- **Description** (primary trigger): includes concrete triggers/symptoms; avoids vague “does everything”.
- **False positives/negatives**: prompts it might match incorrectly vs fail to match.
- **Overlap risk**: collisions with other skills (same domain, similar trigger phrases).

Output: “Trigger Strength” rating + rewrite suggestions.

### 3) Capability Model (What It Teaches)

Extract and summarize:

- Core tasks it claims to support.
- Preconditions and assumptions (tech stack, tools installed, access levels).
- Deliverables (expected outputs, formats, artifacts).
- Anti-scope (“When NOT to use”) and limitations (explicit or missing).
- Degree-of-freedom: where it’s prescriptive vs heuristic.

If the skill includes references, don’t assume the main SKILL.md is complete—sample or selectively read reference files to confirm scope.

### 4) Tooling & Side-Effects Map

Build a table of *everything the skill instructs the agent to do*:

- Shell commands (including examples).
- Network access (curl/wget, HTTP clients, package installs, API calls).
- File system writes (what paths, destructive operations, deletes).
- Privilege/permissions (sudo, elevated access, credential usage).
- External dependencies (libraries, CLIs, SaaS).

For each, record: intent, required permissions, risk, and safe alternatives (sandbox, dry-run, allowlists).

### 5) Security / Privacy / Safety Risk Assessment

Use `references/risk-taxonomy.md` to assess:

- **Prompt injection** exposure (especially if the skill fetches external content).
- **Command injection** risks (string interpolation into shell; unsafe copy/paste patterns).
- **Destructive operations** (rm -rf, overwriting, migrations, irreversible actions).
- **Secrets handling** (API keys, env vars, logs, redaction).
- **Supply-chain** risks (install scripts, unpinned deps, untrusted sources).
- **Data exfiltration** (uploading files, telemetry, “paste logs here” patterns).

Output: severity × likelihood per risk + mitigations + “safe-by-default” recommendations.

### 6) Quality & Correctness Review

- Verify examples for internal consistency (missing imports, wrong prop precedence, mismatched ARIA ids, etc.).
- Check for missing edge cases (cancellation, cleanup, concurrency, accessibility, i18n).
- Check “progressive disclosure” quality: is SKILL.md lean and navigational, with details in `references/`?
- Check for outdated or unstable advice (versions, APIs likely to change); suggest pinning and dates.

### 7) Maintainability & Operational Fit

- Structure: clear headings, searchable keywords, minimal duplication.
- Update strategy: versioning, ownership, changelog expectations (even if no file).
- Testability: are scripts tested? is there a validation workflow?
- Portability: OS assumptions, shell assumptions, tool availability.

### 8) Improvement Plan (Prioritized)

Provide:

- Quick wins (low effort / high impact).
- Structural changes (refactor into references, add scripts, add checklists).
- Safety hardening (guardrails, confirmations, allowlists).
- “Definition of Done” for the next iteration.

### 9) Produce The Report

Use `references/report-template.md` and keep:

- Facts separated from recommendations
- Explicit uncertainty markers when you did not verify something
- Concrete examples (commands, paths, prompts) where useful

## Red Flags — Stop And Re-check

- Only read `SKILL.md` and ignored `scripts/` / `references/`.
- Listed risks without mapping concrete commands/side effects.
- No provenance/supply-chain notes.
- No severity/likelihood distinction (everything “risky”).
- Suggested running scripts you did not read.
- Gave recommendations without tying them to a specific observed gap.

## Deep Checklist (Optional)

If you need a more exhaustive pass, use `references/review-checklist.md` and score with `references/scoring-rubric.md`.