# pdf2markdown > Convert PDF files to Markdown format with optional image extraction. Use when you need to extract text from PDFs, convert PDFs to Markdown, or extract images from PDF documents. - Author: Al4st41r - Repository: Al4st41r/Tools - Version: 20260125115802 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/Al4st41r/Tools - Web: https://mule.run/skillshub/@@Al4st41r/Tools~pdf2markdown:20260125115802 --- --- name: pdf2markdown description: Convert PDF files to Markdown format with optional image extraction. Use when you need to extract text from PDFs, convert PDFs to Markdown, or extract images from PDF documents. --- # PDF to Markdown Converter ## Overview This skill uses the Pdf2Markdown converter to transform PDF files into clean Markdown format. It supports: - PDF to Markdown text conversion - Optional image extraction from PDFs - Automatic filtering of small images (< 100×100px) - Preservation of original image formats - Output to file or stdout ## Prerequisites Before using this skill, ensure: - Dependencies are installed: `cd /home/pi/WebApps/Pdf2Markdown && uv sync` - Python 3.13 is required - The input file exists and is readable - You have write permissions for the output directory ## Quick Start ### Basic Conversion (text only) ```bash cd /home/pi/WebApps/Pdf2Markdown uv run main.py input.pdf output.md ``` ### Conversion with Image Extraction ```bash cd /home/pi/WebApps/Pdf2Markdown uv run main.py input.pdf output.md --extract-images ``` ### Output to Stdout ```bash cd /home/pi/WebApps/Pdf2Markdown uv run main.py input.pdf ``` ## Common Tasks ### Convert a single PDF to Markdown ```bash cd /home/pi/WebApps/Pdf2Markdown uv run main.py document.pdf document.md ``` **Expected output:** - `document.md` - Markdown file with extracted text ### Convert PDF with image extraction ```bash cd /home/pi/WebApps/Pdf2Markdown uv run main.py report.pdf report.md --extract-images ``` **Expected output:** - `report.md` - Markdown file with text and image references - `report_images/` - Folder containing extracted images ### Batch convert multiple PDFs ```bash cd /home/pi/WebApps/Pdf2Markdown for pdf in *.pdf; do uv run main.py "$pdf" "${pdf%.pdf}.md" --extract-images done ``` ### Convert other file formats The tool also supports DOCX, XLSX, PPTX, and HTML files: ```bash cd /home/pi/WebApps/Pdf2Markdown uv run main.py presentation.pptx output.md uv run main.py spreadsheet.xlsx output.md ``` ## Output Structure When using `--extract-images` with PDF files: ``` output.md # Markdown file with content output_images/ # Image folder ├── image_001_001.png # Images from page 1 ├── image_002_001.jpg # Images from page 2 └── image_002_002.png # Second image from page 2 ``` Image references are appended to the markdown: ```markdown ## Extracted Images ![Image 1-1](output_images/image_001_001.png) ![Image 2-1](output_images/image_002_001.jpg) ``` ## Important Notes - **Image extraction** only works with PDF files - **Small images** (< 100×100 pixels) are automatically filtered out to avoid logos/icons - **Original formats** are preserved (JPEG, PNG, etc.) - **Stdout mode** does not support image extraction (requires output file path) - **Existing image folders** will be cleared and recreated ## Troubleshooting ### Command not found Ensure you're in the correct directory: ```bash cd /home/pi/WebApps/Pdf2Markdown ``` ### Dependencies missing Install dependencies: ```bash cd /home/pi/WebApps/Pdf2Markdown uv sync ``` ### Permission denied Check file permissions: ```bash ls -la input.pdf chmod 644 input.pdf ``` ### No images extracted This is normal if: - The PDF contains no images - All images are smaller than 100×100 pixels - Images are embedded in unsupported formats ### PyMuPDF not installed warning Install PyMuPDF: ```bash cd /home/pi/WebApps/Pdf2Markdown uv add pymupdf ``` ## Getting Help View command-line help: ```bash cd /home/pi/WebApps/Pdf2Markdown uv run main.py --help ``` See [REFERENCE.md](REFERENCE.md) for detailed API documentation.