# evaluate > Run comprehensive evaluation of the Financial RAG system to measure quality, performance, and cost metrics. Use when testing RAG performance or validating system quality. - Author: Logan Liu - Repository: JumpLogan/ai-financial-advisor - Version: 20260125213233 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/JumpLogan/ai-financial-advisor - Web: https://mule.run/skillshub/@@JumpLogan/ai-financial-advisor~evaluate:20260125213233 --- --- name: evaluate description: Run comprehensive evaluation of the Financial RAG system to measure quality, performance, and cost metrics. Use when testing RAG performance or validating system quality. --- # Evaluate RAG System When the user invokes this skill, run a comprehensive evaluation of the Financial RAG system. ## Steps to Follow ### 1. Check Prerequisites Before running evaluation, verify: - `run_evaluation.py` exists - `data/chroma_db/` directory exists (vector database) - `.env` file exists with OPENAI_API_KEY configured If any prerequisite is missing, inform the user what needs to be set up. ### 2. Parse User Arguments The evaluation script supports these optional arguments: - `--test-cases `: Use custom test cases JSON file - `--model `: Use specific model (default: gpt-3.5-turbo) - `--output-dir `: Specify output directory (default: evaluation_results) ### 3. Run the Evaluation Execute: `python run_evaluation.py [arguments]` Monitor the output and show progress to the user. ### 4. Display Results After completion, provide a summary including: - Pass rate (percentage of tests that passed) - Hallucinations detected - Average latency per query - Total cost and cost per query - Report file locations ### 5. Provide Recommendations Based on results: - If pass rate < 80%: Suggest improvements - If pass rate >= 80%: Acknowledge good performance - Offer to analyze detailed results or failed test cases ### 6. Offer Next Steps Suggest follow-up actions like analyzing the report, examining failures, or re-running with different settings.