# skill-creator > Guide for creating effective agent skills using BDD (Behavior-Driven Development). Use when creating new skills, editing existing skills, verifying skills work before deployment, or improving skill discoverability. Use when agents fail to follow workflows, when a skill doesn't get discovered, or when discipline skills get rationalized away. Covers skill anatomy, agent search optimization (ASO), behavioral testing with sub-agents, and hardening against rationalization. - Author: pentaxis93 - Repository: pentaxis93/devkit - Version: 20260130235019 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/pentaxis93/devkit - Web: https://mule.run/skillshub/@@pentaxis93/devkit~skill-creator:20260130235019 --- --- name: skill-creator description: >- Guide for creating effective agent skills using BDD (Behavior-Driven Development). Use when creating new skills, editing existing skills, verifying skills work before deployment, or improving skill discoverability. Use when agents fail to follow workflows, when a skill doesn't get discovered, or when discipline skills get rationalized away. Covers skill anatomy, agent search optimization (ASO), behavioral testing with sub-agents, and hardening against rationalization. --- # Creating Skills Guide for creating effective agent skills using BDD. ## What Skills Are Skills are modular packages that extend agent capabilities. They serve two distinct purposes: 1. **Knowledge packages** -- domain-specific information agents lack (API docs, schemas, tool workflows) 2. **Behavior enforcement** -- specific workflows and discipline that agents would otherwise shortcut or rationalize away **Skills are:** reusable techniques, patterns, tools, reference guides, enforceable workflows. **Skills are not:** narratives about how a problem was solved once, project-specific conventions (those go in CLAUDE.md / AGENTS.md), or standard practices well-documented elsewhere. ## Skill Types | Type | Purpose | Freedom | Testing | |------|---------|---------|---------| | **Technique** | Concrete method with steps | Medium | Application scenarios, edge cases | | **Pattern** | Mental model for problems | High | Recognition, counter-examples | | **Reference** | API docs, syntax, tools | High | Retrieval accuracy, application | | **Discipline** | Behavior enforcement | Low | Pressure scenarios, rationalization capture | Identify the type early -- it determines how prescriptive to write, how to test, and whether hardening is needed. ## Core Principles **Context budget is finite.** Skills share the context window with conversation history, system prompts, and other skills. Only add what agents don't already know. Challenge each paragraph: does this justify its token cost? **Test before shipping.** BDD for documentation: observe agent behavior without the skill (RED), write skill addressing specific failures (GREEN), close loopholes (REFACTOR). See [references/testing.md](references/testing.md). **Portability.** Use standard frontmatter (`name` + `description`). Avoid framework-specific paths or variables. Write for "agents" not a specific provider. **Progressive disclosure.** Description always in context (~100 words). SKILL.md body loads on trigger (< 500 lines). References load on demand (unlimited). See [references/structure.md](references/structure.md). ## Skill Anatomy ``` skill-name/ ├── SKILL.md # Required, < 500 lines ├── scripts/ # Executable code ├── references/ # Docs loaded on demand └── assets/ # Files used in output ``` ### SKILL.md Required. Two parts: - **Frontmatter** (YAML): `name` and `description`. These are the only fields agents read to decide whether to load the skill. Write them for discoverability. See [references/discovery.md](references/discovery.md). - **Body** (Markdown): Instructions and guidance. Loaded only after the skill triggers. ### Bundled Resources **scripts/** -- Executable code for tasks that need deterministic reliability or would be rewritten repeatedly. May be executed without loading into context. **references/** -- Documentation loaded on demand. For heavy reference material (> 100 lines), API docs, detailed guides. Keep SKILL.md lean by moving details here. Include clear descriptions of when to load each file. **assets/** -- Files used in output (templates, images, fonts). Not loaded into context; used by agents when producing output. **Do not include:** README.md, CHANGELOG.md, INSTALLATION_GUIDE.md, or other auxiliary documentation. The skill should contain only what an agent needs to do the job. ## Writing Effective Descriptions (ASO) The `description` field is the primary discovery mechanism. Agents read it to decide whether to load the skill. Optimize for Agent Search Optimization (ASO): - **Include trigger conditions**: "Use when creating new skills..." - **Include symptoms**: "...when agents fail to follow workflows..." - **Include keywords**: error messages, tool names, synonyms - **Describe the problem**, not the implementation ```yaml # BAD: Too abstract description: For skill development # GOOD: Triggers, symptoms, keywords description: >- Guide for creating effective agent skills using BDD. Use when creating new skills, editing existing skills, or verifying skills work before deployment. ``` See [references/discovery.md](references/discovery.md) for the full ASO guide. ## Degrees of Freedom Match prescriptiveness to the skill type: **High freedom** (patterns, references): multiple valid approaches, decisions depend on context, heuristics guide the work. Use text instructions and examples. **Medium freedom** (techniques): preferred pattern exists, some variation acceptable. Use pseudocode or parameterized scripts. **Low freedom** (discipline): operations are fragile, consistency is critical, specific sequence must be followed. Use explicit rules, red flags, rationalization counters. Think of the agent as walking a path: a narrow bridge needs guardrails (low freedom), an open field allows many routes (high freedom). ## Creation Process (BDD Cycle) This cycle applies to **new skills and edits to existing skills**. Editing a skill without testing is the same violation as creating one without testing. If the change is substantial, return to RED. ### 1. Understand with Concrete Examples Before creating a skill, understand how it will be used: - What triggers the skill? What would an agent search for? - What does good output look like? - What are the failure modes? - Who is the audience? (Other agents, not humans) Ask for concrete examples. If working with a user, ask: - "Can you give examples of how this skill would be used?" - "What would an agent be doing when it should reach for this skill?" ### 2. Identify Skill Type Classify as technique, pattern, reference, or discipline. This determines: - How prescriptive to write (degrees of freedom) - How to test (application vs. pressure scenarios) - Whether hardening is needed (discipline only) ### 3. RED: Baseline Without Skill Dispatch a sub-agent with a realistic scenario. Do NOT load the skill. Observe natural behavior: ``` Task tool: prompt: | [Realistic scenario that the skill addresses] [No mention of the skill or its rules] [Force a concrete decision or output] ``` Document: - What choices did the agent make? - What rationalizations did it use? (Capture verbatim) - Where did it fail or take shortcuts? **This is mandatory.** If you didn't watch an agent fail without the skill, you don't know what the skill needs to prevent. For discipline skills, use pressure scenarios combining 3+ pressures (time, sunk cost, authority, exhaustion). See [references/testing.md](references/testing.md). ### 4. Plan Reusable Contents Based on the baseline failures, identify what the skill needs: - What would agents re-derive every time without the skill? - What scripts would be rewritten repeatedly? - What reference material is needed but hard to find? - What workflow steps get skipped under pressure? ### 5. GREEN: Write Minimal Skill Address the specific baseline failures observed in RED. Do not add content for hypothetical cases -- write just enough to fix what you saw. Implement reusable resources (`scripts/`, `references/`, `assets/`) before writing SKILL.md -- this may require user input (e.g., brand assets, API docs). Test bundled scripts by running them before committing. Follow the anatomy above. Keep SKILL.md under 500 lines. Move heavy reference material to `references/`. Use imperative mood throughout. Consult these references based on the skill's needs: - **Multi-step processes**: See [references/workflows.md](references/workflows.md) - **Formatted output**: See [references/output-patterns.md](references/output-patterns.md) - **File organization**: See [references/structure.md](references/structure.md) #### Recommended SKILL.md Body Sections ```markdown # Skill Name ## Overview Core principle in 1-2 sentences. What is this? ## When to Use Symptoms and use cases (bullet list). When NOT to use. Small inline flowchart IF decision is non-obvious. ## Core Pattern (for techniques/patterns) Before/after comparison or key workflow. ## Quick Reference Table or bullets for scanning common operations. ## Implementation Detailed steps, inline code for simple patterns, links to reference files for heavy material. ## Common Mistakes What goes wrong + how to fix it. ``` Not every section is needed for every skill. Adapt to the skill type. ### 6. Verify GREEN Dispatch sub-agent WITH the skill loaded. Same scenarios as RED: ``` Task tool: prompt: | You have access to: [skill path] [Same scenario as RED phase] ``` Agent should now succeed or comply. If not, the skill is unclear -- revise and re-test. ### 7. REFACTOR: Close Loopholes If the agent found new ways to fail or rationalize: - Capture new rationalizations verbatim - Add explicit counters for each loophole - For discipline skills: build rationalization table, red flags list - See [references/persuasion-principles.md](references/persuasion-principles.md) - Re-test until bulletproof ### 8. Iterate from Real Usage After deployment, use the skill on real tasks: 1. Notice struggles or inefficiencies 2. Identify how SKILL.md or resources should change 3. Implement changes 4. Re-test (return to RED if the change is substantial) ## Naming Conventions - **Descriptive over generic**: `feature-discovery` not `utils` - **Lowercase hyphenated**: `root-cause-tracing` not `rootCauseTracing` - **Descriptive over generic**: `condition-based-waiting` not `async-helpers` - **Directory name = frontmatter name**: must match exactly ## Flowchart Usage Use flowcharts only for non-obvious decision points, process loops where you might stop too early, or "when to use A vs B" decisions. Never use flowcharts for reference material (use tables), code examples (use markdown blocks), or linear instructions (use numbered lists). See [references/graphviz-conventions.dot](references/graphviz-conventions.dot) for style rules. ## Anti-Patterns | Anti-Pattern | Why It's Bad | |-------------|--------------| | Narrative storytelling | "In session X, we found..." -- too specific, not reusable | | Multi-language dilution | One excellent example beats five mediocre ones | | Code in flowcharts | Can't copy-paste, hard to read | | Generic labels | step1, helper2 -- labels need semantic meaning | | Auxiliary docs | README.md, CHANGELOG.md add clutter | | Skipping RED phase | "Obviously clear" -- clear to you ≠ clear to agents | | Stopping after first GREEN | First pass ≠ bulletproof | | Hypothetical content | Write for observed failures, not imagined ones | ## Attribution Adapted from: - [anthropics/skills/skill-creator](https://github.com/anthropics/skills/tree/main/skills/skill-creator) -- structure, progressive disclosure, degrees of freedom, skill anatomy - [obra/superpowers-skills/writing-skills](https://github.com/obra/superpowers-skills/tree/main/skills/meta/writing-skills) -- BDD methodology, ASO, skill types, hardening, anti-patterns - [obra/superpowers-skills/testing-skills-with-subagents](https://github.com/obra/superpowers-skills/tree/main/skills/meta/testing-skills-with-subagents) -- pressure testing, rationalization capture, RED-GREEN-REFACTOR for docs