# spreadsheet-llm > Compress Excel files into LLM-friendly format and perform AI-based cell range recognition. Use this when users need to process, analyze, or extract data from spreadsheet files. - Author: miaobuao - Repository: miaobuao/spreadsheet-llm - Version: 20251203140815 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/miaobuao/spreadsheet-llm - Web: https://mule.run/skillshub/@@miaobuao/spreadsheet-llm~spreadsheet-llm:20251203140815 --- --- name: spreadsheet-llm description: Compress Excel files into LLM-friendly format and perform AI-based cell range recognition. Use this when users need to process, analyze, or extract data from spreadsheet files. --- # SpreadsheetLLM CLI Tool ## When to Use This Skill Use this tool when the user asks to: - Compress or preprocess Excel/spreadsheet files for LLM analysis - Extract tables or data ranges from spreadsheets - Analyze spreadsheet structure and identify meaningful data regions - Prepare large spreadsheets for token-efficient LLM processing - Find specific data patterns in Excel files (sales tables, financial data, etc.) ## Command Syntax ```bash uv run tools/spreadsheet-llm-cli.py [OPTIONS] INPUT_FILE ``` ## Key Options ### Compression Modes - **Simple mode** (default): Groups cells by value only - **Format-aware mode** (`-f` or `--format-aware`): Groups by value AND formatting ### LLM Recognition - `-r, --recognize`: Enable AI-based cell range identification (auto-detects all data regions) - `-m, --model MODEL`: Specify model (default: `google/gemini-2.5-pro`) - **Recommended**: `google/gemini-2.5-pro` - Best performance and cost-effectiveness - Small sheets (<50 anchors): `google/gemini-2.5-pro` or `gpt-4o-mini` - Medium sheets (50-200): `google/gemini-2.5-pro` (recommended) - Large sheets (>200): `google/gemini-2.5-pro` (best value) - `--original-coords`: **Return original spreadsheet coordinates - Recommended for agent code generation** ### Other Options - `-o, --output-dir DIR`: Output directory (default: `output/`) - `-s, --sheet SHEET`: Process specific sheet (by index or name) ## Common Usage Patterns ### 1. Basic Compression ```bash # Simple compression uv run tools/spreadsheet-llm-cli.py input.xlsx # Format-aware compression uv run tools/spreadsheet-llm-cli.py input.xlsx -f ``` ### 2. AI-Powered Region Recognition ```bash # Use specific model uv run tools/spreadsheet-llm-cli.py complex.xlsx -r --original-coords -m google/gemini-2.5-pro # Format-aware recognition for less tokens uv run tools/spreadsheet-llm-cli.py sales.xlsx -f -r --original-coords -m google/gemini-2.5-pro ``` ### 3. Process Specific Sheets ```bash # By index (0-based) uv run tools/spreadsheet-llm-cli.py workbook.xlsx -s 1 # By name uv run tools/spreadsheet-llm-cli.py workbook.xlsx -s "Q4 Results" ``` ## Output Files For input file `example.xlsx` with sheet named "Sheet1", generates: - `example_Sheet1_areas.txt`: Compressed spreadsheet representation - `example_Sheet1_dict.txt`: Value-to-cell coordinate mappings - `example_Sheet1_mapping.json`: Compression metadata and anchors - `example_Sheet1_compressed.xlsx`: Compressed Excel file - `example_Sheet1_recognition.txt`: AI recognition results (if `-r` used) Files include sheet name in filename to distinguish different sheets. Add `_format_aware` suffix when using `-f` flag (e.g., `example_Sheet1_format_aware_areas.txt`). ## Environment Setup ### Required for Recognition ```bash export OPENAI_API_KEY="your-api-key" ``` ### Optional: Custom API Endpoint ```bash export OPENAI_BASE_URL="http://localhost:1234/v1" # For local LLMs ``` ## Example Workflows ### User: "Compress this financial spreadsheet" ```bash uv run tools/spreadsheet-llm-cli.py financial_report.xlsx -f -o results/ ``` ### User: "Find all sales tables in this Excel file" ```bash uv run tools/spreadsheet-llm-cli.py sales_data.xlsx -r --original-coords -p "Identify all sales tables with product and revenue columns" ``` ### User: "Extract data from the Budget sheet" ```bash uv run tools/spreadsheet-llm-cli.py annual_report.xlsx -s "Budget" -r --original-coords ``` ## Understanding Output ### Compression Info The tool displays: ``` ANCHOR INFORMATION: Row anchors: 45 (from 500 original rows) Column anchors: 12 (from 50 original columns) Compression ratio: 9.0% rows, 24.0% columns retained ``` ### Recognition Results Shows: - **Reasoning**: Why certain ranges were identified - **Cell Ranges**: List of meaningful data regions with: - Title/description - Cell range coordinates (e.g., `A1:C10`) - Compressed encoding (for efficient LLM communication) ## Troubleshooting ### Import Error If `spreadsheet_llm` module not found: ```bash cd /Volumes/Yang/dev/github/spreadsheet-agent pip install -e . ``` ### Recognition Issues - Verify `OPENAI_API_KEY` is set: `echo $OPENAI_API_KEY` - Check model name is valid - Ensure network connectivity to API ### Performance Tips - Use `-m google/gemini-2.5-pro` for best performance and value (recommended) - Use `-m gpt-4o-mini` for faster processing with lower accuracy - Process specific sheets with `-s` for large workbooks - Omit `-f` flag for faster compression (if formatting not important) ## Technical Notes - Supports Excel formats: `.xlsx`, `.xlsb`, `.xls` - Compression reduces token usage by 5-10x - Recognition quality improves with format-aware mode (`-f`) - Original coordinates can be preserved with `--original-coords` ## Related Files - CLI script: `tools/spreadsheet-llm-cli.py` - Main package: `packages/spreadsheet_llm/` - Configuration: `pyproject.toml`