# run-benchmark > Run and interpret the File API vs Inline benchmark for Gemini performance testing. Use when discussing performance optimization, caching strategies, or comparing document upload approaches. - Author: nimag - Repository: nimag/fast - Version: 20251222124013 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/nimag/fast - Web: https://mule.run/skillshub/@@nimag/fast~run-benchmark:20251222124013 --- --- name: run-benchmark description: Run and interpret the File API vs Inline benchmark for Gemini performance testing. Use when discussing performance optimization, caching strategies, or comparing document upload approaches. allowed-tools: - Bash - Read --- # Run Benchmark Tool ## Purpose Compare Gemini File API vs Inline document approaches for performance. ## What It Tests 1. **File API**: Upload documents once, reuse cached URIs for multiple queries 2. **Inline**: Send raw document bytes with each request The benchmark shuffles document order each round to prevent Gemini's native caching from affecting results. ## Running the Benchmark ### Basic Usage ```bash export GEMINI_API_KEY="your-key" make build ./bin/benchmark -docs test_loan_files/loan_file_1_LN-2024-001847 ``` ### With Options ```bash ./bin/benchmark \ -docs /path/to/documents \ -rounds 20 \ -max-docs 10 \ -json ``` ### CLI Flags ``` -docs string Directory containing documents (required) -rounds int Number of test rounds per method (default 10) -max-docs int Maximum income documents to use (default 6) -json Output results as JSON -income Only use income documents (default true) ``` ### Using Makefile ```bash make benchmark DOCS=test_loan_files/loan_file_1_LN-2024-001847 ROUNDS=10 ``` ## Understanding Results ### Output Structure ``` PHASE 1: INLINE DOCUMENTS Round 1: [shuffled order] -> time, tokens ... PHASE 2: FILE API Upload: X seconds (one-time) Round 1: [shuffled order] -> time, tokens ... FINAL COMPARISON - Total time comparison - Average per-query time - Token usage - Winner & speedup factor - Break-even analysis ``` ### Key Metrics | Metric | Meaning | |--------|---------| | Upload time | One-time cost for File API | | Total time | Sum of all operations | | Avg per round | Mean time per query | | Min/Max round | Query time variance | | Speedup | How much faster winner is | | Break-even | Queries needed for File API to win | ### Interpreting Results **File API wins when:** - Many queries against same documents - Break-even point is low (< 10 queries) - Per-query savings compound **Inline wins when:** - Few queries (< break-even) - Different documents each time - Simplicity preferred ## Example Output ``` TIMING COMPARISON ┌─────────────────┬──────────────────┬──────────────────┐ │ Metric │ File API │ Inline Docs │ ├─────────────────┼──────────────────┼──────────────────┤ │ Upload (1x) │ 1.976s │ N/A │ │ Total time │ 1m13.733s │ 1m14.454s │ │ Avg per round │ 7.176s │ 7.445s │ └─────────────────┴──────────────────┴──────────────────┘ BREAK-EVEN ANALYSIS Upload overhead: 1.976s Savings per query: 270ms Break-even at: 7.3 queries ``` ## Why Shuffled Order? Documents are shuffled each round because: - Gemini may cache based on content/order - Shuffling ensures each query is "fresh" - Gives accurate per-query timing - More realistic for production workloads ## Test Questions The benchmark uses varied income-related questions: - Annual/monthly income extraction - Employer information - YTD income calculation - Deductions and withholdings - Income source classification - Tax year coverage - And more... ## Recommendations | Scenario | Recommendation | |----------|----------------| | Underwriter iterating on loan | File API | | One-off document analysis | Inline | | Batch processing same docs | File API | | Real-time different docs | Inline | ## Related Files - `cmd/benchmark/main.go` - Benchmark implementation - `internal/gemini/client.go` - Both API approaches - `internal/gemini/cache.go` - File caching logic