# run-benchmark > Run the benchmark evaluation for one or all framework implementations against a scenario. Use this skill when the user wants to execute benchmarks, run evaluations, or test a framework implementation. Handles execution and result collection. - Author: Raphael Ballet - Repository: rballet/poc-ai-workflow-frameworks - Version: 20260209143915 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-09 - Source: https://github.com/rballet/poc-ai-workflow-frameworks - Web: https://mule.run/skillshub/@@rballet/poc-ai-workflow-frameworks~run-benchmark:20260209143915 --- --- name: run-benchmark description: > Run the benchmark evaluation for one or all framework implementations against a scenario. Use this skill when the user wants to execute benchmarks, run evaluations, or test a framework implementation. Handles execution and result collection. --- # Run Benchmark ## When to use Use when the user asks to run a benchmark, evaluate a framework, or test an implementation. ## Prerequisites - `OPENAI_API_KEY` environment variable must be set - All packages must be installed: `uv sync --all-packages` ## Steps 1. Verify environment: ```bash echo $OPENAI_API_KEY | head -c 8 ``` 2. Run for a single framework: ```bash uv run python scripts/run_eval.py --framework --scenario rag_qa ``` Valid framework names: `pydantic_ai`, `langgraph`, `smolagents` 3. Run for all frameworks: ```bash uv run python scripts/run_eval.py --all --scenario rag_qa ``` 4. Results are saved to `results/__.json` 5. Generate comparison report: ```bash uv run python scripts/compare.py results/*.json -o results/comparison.md ``` ## Troubleshooting - If import errors occur, run `uv sync --all-packages` to reinstall - If API errors occur, verify `OPENAI_API_KEY` is set and valid - Each framework run takes ~30-60 seconds (7 questions + judge calls)