# file-to-markdown > Convert any file to markdown format using the markitdown library. Use this skill when users need to convert documents (PDF, DOCX, XLSX, PPTX, images, HTML, CSV, JSON, XML, audio files, etc.) into markdown format for easier reading, editing, or integration into markdown-based workflows. - Author: Paweł Lipowczan - Repository: plipowczan/anthropics_skills - Version: 20251224231707 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/plipowczan/anthropics_skills - Web: https://mule.run/skillshub/@@plipowczan/anthropics_skills~file-to-markdown:20251224231707 --- --- name: file-to-markdown description: Convert any file to markdown format using the markitdown library. Use this skill when users need to convert documents (PDF, DOCX, XLSX, PPTX, images, HTML, CSV, JSON, XML, audio files, etc.) into markdown format for easier reading, editing, or integration into markdown-based workflows. license: Complete terms in LICENSE.txt --- # File to Markdown Converter Convert files to markdown format using the markitdown library. This skill handles documents, images, audio, structured data, and more. ## When to Use This Skill Use this skill when the user needs to: - Convert documents (PDF, DOCX, PPTX, XLSX) to markdown - Extract text from images using OCR - Transcribe audio files to text - Convert structured data (CSV, JSON, XML) to markdown tables - Process web content (HTML, MHTML) into markdown - Batch convert multiple files to markdown ## Supported Formats **Documents**: PDF, DOCX, PPTX, XLSX **Web**: HTML, MHTML **Images**: PNG, JPG, JPEG, GIF (with OCR and description) **Audio**: MP3, WAV (with transcription) **Data**: CSV, JSON, XML **Archives**: ZIP **Other**: Plain text files ## Decision Tree: Choosing Your Approach ```text User request → Single file or multiple files? ├─ Single file → Use helper script │ └─ Run: python scripts/convert_file.py [output] │ └─ Multiple files → Use batch conversion └─ Run: python scripts/batch_convert.py [output_dir] [--pattern PATTERN] ``` ## Installation Check Before converting, verify markitdown is installed: ```bash pip install markitdown ``` For full functionality (image OCR, audio transcription): ```bash pip install markitdown[all] ``` ## Conversion Workflow ### Single File Conversion **Use the helper script** as your primary method: ```bash python scripts/convert_file.py input_file.pdf output.md ``` The script handles: - File validation - Conversion with error handling - Output file creation with proper encoding - Progress reporting **If output filename is omitted**, the script creates `input_file.md` automatically. ### Batch Conversion **For multiple files**, use the batch converter: ```bash # Convert all files in a directory python scripts/batch_convert.py ./documents # Specify output directory python scripts/batch_convert.py ./documents ./markdown_output # Filter by pattern python scripts/batch_convert.py ./documents ./output --pattern "*.pdf" # Multiple extensions python scripts/batch_convert.py ./documents ./output --pattern "*.{pdf,docx}" ``` The batch script: - Automatically excludes `.md` files - Provides progress tracking - Reports success/failure for each file - Creates output directories as needed ### Direct Python Integration **When helper scripts don't fit**, use the markitdown library directly: ```python from markitdown import MarkItDown # Initialize converter md = MarkItDown() # Convert file try: result = md.convert("path/to/file.pdf") if result and result.text_content: # Process or save markdown with open("output.md", "w", encoding="utf-8") as f: f.write(result.text_content) else: print("No content extracted") except Exception as e: print(f"Conversion failed: {e}") ``` ## Format-Specific Guidance ### Images (PNG, JPG, GIF) - markitdown performs OCR to extract text - Can generate image descriptions using vision models - Best results with clear, well-lit text - May not preserve complex layouts perfectly ### Audio (MP3, WAV) - Automatically transcribed to text - Requires good audio quality for accuracy - Processing time increases with file length - Output formatted as markdown text ### Documents (PDF, DOCX, PPTX, XLSX) - Text extraction maintains basic structure - Tables converted to markdown tables - Some complex formatting may be simplified - XLSX: each sheet becomes a section with table ### Structured Data (CSV, JSON, XML) - CSV: converted to markdown tables - JSON: formatted as readable text structure - XML: converted to hierarchical markdown ### Web Content (HTML, MHTML) - Extracts main content - Converts HTML to clean markdown - Preserves links and basic formatting ## Error Handling **Common errors and solutions:** 1. **ImportError: markitdown not installed** - Install with: `pip install markitdown` - For full features: `pip install markitdown[all]` 2. **FileNotFoundError** - Verify file path is correct - Use absolute paths when uncertain 3. **No content extracted** - File may be corrupted or empty - Format may not be supported - Try with a different file to verify installation 4. **Encoding errors** - Always use `encoding='utf-8'` when writing output files - Helper scripts handle this automatically ## Best Practices - **Start with helper scripts**: They handle common cases reliably - **Test with samples first**: Verify conversion quality before batch processing - **Use batch converter for large sets**: More efficient than individual conversions - **Handle errors gracefully**: Not all files convert perfectly - **Preserve original files**: Conversion is non-destructive, but verify output before deleting sources - **Check output quality**: Some complex formatting may not translate perfectly ## Reference Files ### scripts/ - **convert_file.py**: Single file conversion with error handling - **batch_convert.py**: Directory-based batch conversion with pattern matching ### references/ - **markitdown_api.md**: Complete API reference for markitdown library - **format_guide.md**: Format-specific conversion tips and limitations **Always run scripts with `--help` first** to see current usage and options.