# docx-parsing > Parse Word documents (.docx) into repository-friendly markdown artifacts. Use when users need to extract content from Word files, convert to markdown, or batch process document directories. - Author: nikhilmaddirala - Repository: nikhilmaddirala/gtd-cc - Version: 20260125170132 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/nikhilmaddirala/gtd-cc - Web: https://mule.run/skillshub/@@nikhilmaddirala/gtd-cc~docx-parsing:20260125170132 --- --- name: docx-parsing description: Parse Word documents (.docx) into repository-friendly markdown artifacts. Use when users need to extract content from Word files, convert to markdown, or batch process document directories. --- # Word Document Parsing Parse Word documents (.docx) into markdown, JSON, and image artifacts using multi-method extraction. ## Usage Run the parsing script directly: ```bash ./scripts/parse_docx.py ``` **Example:** ```bash ./scripts/parse_docx.py ~/documents/report.docx ./parsed/ ``` The script uses 4 extraction methods: - python-docx (basic) - Fast text extraction - python-docx (detailed) - Full structure with tables - docx2txt - Simple text-only fallback - markitdown - Microsoft's markdown converter ## Output Structure ``` output_dir/ ├── file.docx/ │ ├── parsing_summary.json │ ├── python_docx_basic/ │ │ └── content.md │ ├── python_docx_detailed/ │ │ ├── content.md │ │ ├── tables.json │ │ └── images/ │ ├── docx2txt/ │ │ └── content.txt │ └── markitdown/ │ └── content.md ``` ## Script Features - Self-contained Python script with inline uv metadata - Handles multiple extraction methods for redundancy - Creates JSON metadata for tables and document structure - Extracts images with dimensions and metadata - Continues on errors (one method failure doesn't stop others)