# skill-evaluator > Comprehensive evaluation toolkit for analyzing Claude skills across security, quality, utility, and compliance dimensions. This skill should be used when users need to evaluate a skill before installation, review before publishing, or assess overall quality and safety. Performs 5-layer security analysis, validates structure and documentation, checks compliance with skill-creator guidelines, and generates markdown reports with scoring and recommendations. - Author: bjulius - Repository: bjulius/skill-evaluator - Version: 20251118152152 - Stars: 13 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/bjulius/skill-evaluator - Web: https://mule.run/skillshub/@@bjulius/skill-evaluator~skill-evaluator:20251118152152 --- --- name: skill-evaluator license: MIT description: Comprehensive evaluation toolkit for analyzing Claude skills across security, quality, utility, and compliance dimensions. This skill should be used when users need to evaluate a skill before installation, review before publishing, or assess overall quality and safety. Performs 5-layer security analysis, validates structure and documentation, checks compliance with skill-creator guidelines, and generates markdown reports with scoring and recommendations. --- # Skill Evaluator Comprehensive evaluation toolkit for analyzing Claude skills before installation or publication. ## Purpose Evaluate Claude skills across four critical dimensions: 1. **Security** - Identify vulnerabilities, injection risks, privilege escalation, and security weaknesses 2. **Quality** - Assess code quality, documentation clarity, structural organization, and functionality 3. **Utility** - Evaluate practical value, usability, scope appropriateness, and effectiveness 4. **Compliance** - Validate adherence to skill-creator guidelines and best practices Generate detailed markdown reports with scores (0-100), risk assessments, and actionable recommendations. ## When to Use This Skill Use this skill when: - **Evaluating skills before installation** - Assess safety and quality of third-party skills - **Pre-publication review** - Validate skills before distributing to others - **Security auditing** - Check for vulnerabilities and security risks - **Quality assessment** - Review code quality and documentation - **Compliance validation** - Ensure skills follow skill-creator guidelines ## Evaluation Modes ### Mode 1: Full Evaluation (Default) **Usage:** "Evaluate this skill at [path/to/skill]" Comprehensive analysis across all four dimensions with detailed scoring and recommendations. **Output:** Complete markdown report with overall score, security analysis, quality assessment, utility evaluation, compliance validation, and recommendations. ### Mode 2: Security-Focused Quick Check **Usage:** "Is this skill safe to install?" or "Check the security of [skill-path]" Deep security analysis with brief checks on other dimensions. **Output:** Security-focused report emphasizing vulnerabilities, risk level, and installation safety. ### Mode 3: Pre-Publication Review **Usage:** "Review my skill before I publish it" or "Help me improve [skill-path] for publication" Full evaluation with detailed, actionable improvement guidance for skill authors. **Output:** Comprehensive report with prioritized recommendations for improvement. ## How to Use ### Basic Usage 1. **Provide the skill path** (directory or .zip file): ``` "Evaluate the skill at /path/to/my-skill" "Is /path/to/skill.zip safe to install?" ``` 2. **Claude will execute evaluation scripts** to analyze the skill: - `scripts/evaluate_skill.py` - Main orchestrator - `scripts/security_scanner.py` - 5-layer security analysis - `scripts/quality_checker.py` - Quality assessment - `scripts/compliance_validator.py` - Compliance validation - `scripts/report_generator.py` - Report creation 3. **Receive a markdown report** with scores, findings, and recommendations ### Understanding the Report #### Overall Score (0-100) Weighted calculation: - Security: 35% (highest weight due to critical importance) - Quality: 25% - Utility: 20% - Compliance: 20% **Score Ranges:** - **90-100**: EXCELLENT - Highly recommended - **75-89**: GOOD - Recommended - **60-74**: FAIR - Use with caution - **40-59**: POOR - Not recommended - **0-39**: CRITICAL - Do not install #### Security Analysis Uses **5-layer defense-in-depth architecture**: 1. **Layer 1: Input Validation & Sanitization** - Command injection, path traversal, file validation 2. **Layer 2: Execution Environment Control** - Privilege escalation, sandboxing, environment manipulation 3. **Layer 3: Output Sanitization** - XSS prevention, information disclosure, data exposure 4. **Layer 4: Privilege Management** - Credential handling, weak cryptography, authentication 5. **Layer 5: Self-Protection** - DoS patterns, SSRF, resource exhaustion **Vulnerability Severity:** - **CRITICAL**: Command injection, arbitrary code execution, privilege escalation - **HIGH**: Path traversal, insecure deserialization, SSRF - **MEDIUM**: Information disclosure, weak crypto, XSS - **LOW**: Minor issues, hardening opportunities **Security Overrides:** - Security score < 50 → ❌ DO NOT INSTALL (automatic) - Any CRITICAL vulnerability → ❌ DO NOT INSTALL (automatic) #### Quality Assessment Four quality dimensions (25 points each): 1. **Code Quality** - Readability, error handling, modularity, dependencies, best practices 2. **Documentation** - Purpose clarity, usage instructions, resource references, writing quality, completeness 3. **Structure & Organization** - Directory structure, file naming, YAML frontmatter 4. **Functionality** - Practical value, appropriate tool usage, reusability, completeness #### Utility Evaluation Assesses practical value (100 points): - **Problem-solving value** (25 pts) - Addresses real needs - **Usability** (25 pts) - Clear and easy to use - **Scope** (25 pts) - Appropriate complexity and boundaries - **Effectiveness** (25 pts) - Works as described #### Compliance Validation Validates against skill-creator guidelines (100 points): - SKILL.md structure (10 pts) - YAML frontmatter (20 pts) - Progressive disclosure (15 pts) - Scripts/references/assets usage (30 pts total) - Writing style (10 pts) - Trigger description (10 pts) **Critical Violations (Auto-Fail):** - Missing SKILL.md - Missing required YAML fields - Invalid YAML syntax ## Bundled Resources ### Scripts (`scripts/`) Execute these for evaluation: - **`evaluate_skill.py`** - Main orchestrator coordinating all analyses - **`security_scanner.py`** - 5-layer security architecture with pattern detection - **`quality_checker.py`** - Code quality, documentation, and structure assessment - **`compliance_validator.py`** - Guideline adherence and compliance checking - **`report_generator.py`** - Markdown report generation from results ### References (`references/`) Load these for detailed evaluation criteria: - **`security_patterns.md`** - Vulnerability pattern database with detection criteria and secure examples - **`quality_criteria.md`** - Quality assessment rubrics and scoring guidelines - **`compliance_checklist.md`** - skill-creator guideline requirements - **`evaluation_methodology.md`** - Evaluation process, scoring formulas, and report structure ### Assets (`assets/`) - **`report_template.md`** - Markdown report template with structured sections ## Evaluation Workflow ### Step 1: Skill Discovery Accept skill input (directory or .zip), extract if needed, identify SKILL.md and bundled resources. ### Step 2: Run Analyses Execute evaluations: Security Scanner → Quality Checker → Compliance Validator → Utility Evaluator ### Step 3: Calculate Scores Apply weighted formula and override rules: ``` Overall = (Security × 0.35) + (Quality × 0.25) + (Utility × 0.20) + (Compliance × 0.20) ``` ### Step 4: Generate Report Create markdown report using template with executive summary, detailed analyses, and recommendations. ### Step 5: Save Report Write report to `{skill_name}_evaluation_report.md` and present to user. ## Installation Recommendations - **✅ HIGHLY RECOMMENDED** (90-100) - Excellent quality, safe to install - **✅ RECOMMENDED** (75-89) - Good quality, safe to install - **⚠️ USE WITH CAUTION** (60-74) - Review findings before installing - **⚠️ NOT RECOMMENDED** (40-59) - Major improvements needed - **❌ DO NOT INSTALL** (0-39 or security override) - Critical issues, unsafe ## Limitations ### Can Assess - ✅ Static code analysis - ✅ Pattern-based vulnerability detection - ✅ Structure and compliance - ✅ Documentation quality ### Cannot Assess - ❌ Runtime behavior - ❌ Performance at scale - ❌ Novel attack vectors - ❌ Subjective satisfaction ## ⚠️ Important Disclaimers **READ CAREFULLY BEFORE USING THIS SKILL** ### No Guarantee of Safety **This evaluation CANNOT determine with certainty that a skill is safe.** Like all security analysis tools: - **Cannot prove absence of vulnerabilities** - Only detect known patterns; novel or obfuscated attacks may go undetected - **Static analysis limitations** - Cannot assess runtime behavior, dynamic code execution, or context-dependent risks - **False negatives possible** - Sophisticated malicious code may evade pattern-based detection - **Time-bound assessment** - New vulnerabilities may be discovered after evaluation ### Use as ONE Input Only **This evaluation should be used as ONE input into your security decision, not the sole determining factor.** You are responsible for: 1. **Manual code review** - Read and understand the skill's code yourself 2. **Test in isolated environment** - Run skills in sandboxed/test environments first 3. **Organizational policies** - Always follow your organization's security policies and approval processes 4. **Risk assessment** - Consider your specific threat model and risk tolerance 5. **Ongoing monitoring** - Continue to monitor skill behavior after installation ### Your Responsibility - **YOU are responsible for skills you install** - Not the evaluator, not the skill author - **Follow organizational policies** - Security policies override any evaluation recommendation - **Trust but verify** - Even "HIGHLY RECOMMENDED" skills should be reviewed - **When in doubt, don't install** - If unsure about a skill's safety, consult security experts ### Limitations of Automated Analysis This tool performs **pattern-based static analysis**, which means: - ✅ Good at: Detecting common vulnerability patterns, structural issues, compliance violations - ❌ Cannot detect: Zero-day exploits, logic bombs, social engineering, supply chain attacks - ❌ Cannot assess: Author trustworthiness, long-term maintenance, backdoor triggers - ❌ Cannot guarantee: Complete security, absence of malicious intent, future safety ### Legal Disclaimer **NO WARRANTIES**: This evaluation tool is provided "as-is" without warranties of any kind. The authors and contributors assume no liability for damages resulting from use of this tool or skills evaluated by it. **USE AT YOUR OWN RISK**: You accept all risks associated with installing and using evaluated skills. ## Examples ### Example 1: Security Check **User:** "Is /downloads/data-analyzer.zip safe?" **Output:** Security report with vulnerabilities, risk level, and installation recommendation. ### Example 2: Pre-Publication **User:** "Review my skill: /my-projects/excel-parser/" **Output:** Full evaluation with priority improvements and publication readiness assessment. ### Example 3: Full Evaluation **User:** "Evaluate /skills/api-connector/" **Output:** Complete report with all dimensions, scores, and recommendations. ## Best Practices for Skill Authors ### Security - Never use subprocess with shell=True - Validate and sanitize inputs - Use Path.resolve() for paths - Avoid hardcoded credentials - Implement error handling ### Quality - Write clean, readable code - Add type hints and docstrings - Remove TODO placeholders - Provide comprehensive documentation ### Compliance - Use imperative/infinitive form - Write clear, specific descriptions - Follow progressive disclosure - Organize files correctly - Use lowercase-with-hyphens naming ### Utility - Solve real problems - Provide clear instructions - Include practical examples - Ensure appropriate scope