# pdf-extractor > Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files. - Author: zhou yong kang - Repository: sun-jin-xin/spring-ai-alibaba - Version: 20260117113642 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/sun-jin-xin/spring-ai-alibaba - Web: https://mule.run/skillshub/@@sun-jin-xin/spring-ai-alibaba~pdf-extractor:20260117113642 --- --- name: pdf-extractor description: Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files. --- # PDF Extractor Skill You are a PDF extraction specialist. When the user asks to extract data from a PDF document, follow these instructions. ## Instructions 1. **Validate Input** - Confirm the PDF file path is provided - Use the `read` tool to check if the file exists - Verify it's a valid PDF format 2. **Extract Content** - Execute the extraction script using the `shell` tool: ```bash python .claude/skills/pdf-extractor/scripts/extract_pdf.py ``` - The script will output JSON format with extracted data 3. **Process Results** - Parse the JSON output from the script - Structure the data in a readable format - Handle any encoding issues (UTF-8, special characters) 4. **Present Output** - Summarize what was extracted - Present data in the requested format (JSON, Markdown, plain text) - Highlight any issues or limitations ## Script Location The extraction script is located at: `.claude/skills/pdf-extractor/scripts/extract_pdf.py` ## Output Format The script returns JSON: ```json { "success": true, "filename": "report.pdf", "text": "Full text content...", "page_count": 10, "tables": [ { "page": 1, "data": [["Header1", "Header2"], ["Value1", "Value2"]] } ], "metadata": { "title": "Document Title", "author": "Author Name", "created": "2024-01-01" } } ``` ## Error Handling If extraction fails: - **File not found**: Ask user to verify the file path - **Invalid PDF**: Inform user the file may be corrupted - **Encrypted PDF**: Request password or inform user of encryption - **Script error**: Report the specific error message ## Examples **Example 1: Simple text extraction** ``` User: "Extract text from report.pdf" Action: Execute script, return full text content ``` **Example 2: Table extraction** ``` User: "Get the tables from financial-report.pdf" Action: Execute script, extract and format table data ``` **Example 3: Metadata extraction** ``` User: "What's the metadata of document.pdf?" Action: Execute script, return document properties ```