# markdown > Convert any document format TO Markdown. Supports 14 formats (PDF, DOCX, XLSX, PPTX, HTML, CSV, EPUB, MSG, and more) via unified CLI. Use when Claude needs to read or extract text from non-Markdown files. - Author: sarukas - Repository: sarukas/claude-skill-markdown - Version: 20260209104331 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-09 - Source: https://github.com/sarukas/claude-skill-markdown - Web: https://mule.run/skillshub/@@sarukas/claude-skill-markdown~markdown:20260209104331 --- --- name: markdown description: Convert any document format TO Markdown. Supports 14 formats (PDF, DOCX, XLSX, PPTX, HTML, CSV, EPUB, MSG, and more) via unified CLI. Use when Claude needs to read or extract text from non-Markdown files. --- # Markdown - Document-to-Markdown Conversion Convert documents to Markdown for reading, analysis, and processing. ## Decision Tree ``` User Request | +-- Convert file to Markdown | +-- Single file --> scripts/convert_to_md.py input.pdf | +-- With explicit output --> scripts/convert_to_md.py input.pdf output.md | +-- Batch directory --> scripts/convert_to_md.py -d ./folder/ -r [-t pdf docx] | +-- Check available formats --> scripts/convert_to_md.py --list-formats | +-- Check dependencies --> scripts/convert_to_md.py --check-deps [format] | +-- Read/analyze document content | +-- Convert first, then analyze the Markdown output | +-- XLSX with specific sheets | +-- scripts/convert_to_md.py data.xlsx --sheets Sheet1 Sheet2 ``` ## Single File Conversion ```bash python scripts/convert_to_md.py report.pdf python scripts/convert_to_md.py report.pdf output.md python scripts/convert_to_md.py data.xlsx --sheets Sheet1 ``` Output defaults to same name with `.md` extension in the same directory. ## Batch Conversion ```bash python scripts/convert_to_md.py -d ./contracts/ -r # All supported types, recursive python scripts/convert_to_md.py -d ./contracts/ -t pdf docx # Only PDF and DOCX python scripts/convert_to_md.py -d ./contracts/ -o ./output/ # Custom output directory python scripts/convert_to_md.py -d ./contracts/ --no-skip # Re-convert even if .md exists ``` ## Info Commands ```bash python scripts/convert_to_md.py --list-formats # Show all formats + dependency status python scripts/convert_to_md.py --check-deps # Check all dependencies python scripts/convert_to_md.py --check-deps pdf # Check PDF deps only ``` ## Supported Formats | Format | Extensions | Library | Notes | |--------|-----------|---------|-------| | PDF | .pdf | pymupdf4llm + pdfplumber | Best table extraction, dual-engine | | XLSX | .xlsx | openpyxl | Sheet selection, formula preservation | | XLS | .xls | markitdown | Legacy Excel | | DOCX | .docx | markitdown | Word documents | | PPTX | .pptx | markitdown | PowerPoint slides | | HTML | .html, .htm | html2text + BeautifulSoup | Table preservation | | CSV/TSV | .csv, .tsv | stdlib csv | Auto-detect delimiter | | EPUB | .epub | markitdown | E-books | | MSG | .msg | markitdown | Outlook messages | | IPYNB | .ipynb | markitdown | Jupyter notebooks | | JSON | .json | markitdown | Structured data | | XML | .xml | markitdown | Structured markup | | ZIP | .zip | markitdown | Archive contents | | Images | .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp | markitdown | OCR/description | | Audio | .mp3, .wav | markitdown | Transcription | **14 formats, 27 extensions total.** ## Format-Specific Options ### PDF - Dual-engine: pymupdf4llm (primary) with pdfplumber fallback for tables - Large files chunked automatically ### XLSX - `--sheets Sheet1 Sheet2`: Convert only specific sheets - Preserves table structure with headers ### HTML - Strips scripts/styles, preserves tables and links - Handles both local files and saved web pages ### CSV/TSV - Auto-detects delimiter (comma, tab, semicolon, pipe) - Outputs as Markdown table ## Dependencies Each format has its own requirements file in `scripts/converters/`: ```bash # Install all dependencies pip install -r scripts/converters/requirements-all.txt # Or install per-format pip install -r scripts/converters/requirements-pdf.txt pip install -r scripts/converters/requirements-xlsx.txt pip install -r scripts/converters/requirements-html.txt pip install -r scripts/converters/requirements-csv.txt pip install -r scripts/converters/requirements-markitdown.txt # DOCX, XLS, PPTX, EPUB, MSG, etc. ``` Core dependencies: - **PDF**: `pymupdf pymupdf4llm pdfplumber` - **XLSX**: `openpyxl` - **HTML**: `beautifulsoup4 html2text` - **CSV**: stdlib (no install needed) - **Markitdown formats**: `markitdown` ## Troubleshooting **"Unsupported file extension"** - Run `--list-formats` to see supported extensions - Check file has correct extension **"Missing dependencies"** - Run `--check-deps [format]` to see what's needed - Install with pip as shown above **Large PDF produces poor output** - The converter uses dual-engine approach; pdfplumber handles complex tables better - For scanned PDFs, OCR support depends on markitdown **XLSX tables look wrong** - Try specifying `--sheets` to convert individual sheets - Very wide tables may wrap in Markdown **Verbose logging** ```bash python scripts/convert_to_md.py -v report.pdf # Debug-level logging python scripts/convert_to_md.py -q report.pdf # Suppress informational output ```