# docx > Comprehensive Word document creation, editing, and analysis. Use when working with .docx files: creating new documents, editing with tracked changes, adding comments, extracting text, or analyzing document structure. - Author: Will Hunter - Repository: willhunterrippling/rippling-os-2 - Version: 20260129103157 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/willhunterrippling/rippling-os-2 - Web: https://mule.run/skillshub/@@willhunterrippling/rippling-os-2~docx:20260129103157 --- --- name: docx description: "Comprehensive Word document creation, editing, and analysis. Use when working with .docx files: creating new documents, editing with tracked changes, adding comments, extracting text, or analyzing document structure." --- # DOCX Processing This skill provides capabilities for creating, editing, and analyzing Word documents (.docx files). ## Setup This skill is located at `~/.cursor/skills/docx`. Set the path before running commands: ```bash DOCX_SKILL="$HOME/.cursor/skills/docx" ``` ## Workflow Decision Tree ### Creating New Document Use **docx-js** (JavaScript/TypeScript) ### Editing Existing Document - **Simple changes to your own document**: Basic OOXML editing - **Someone else's document**: Use **Redlining workflow** (recommended) - **Legal, academic, business, government docs**: Use **Redlining workflow** (required) ### Reading/Analyzing Content Use pandoc or raw XML access --- ## Creating New Documents Use docx-js to create Word documents programmatically. ### Workflow 1. **MANDATORY**: Read the complete docx-js reference: [docx-js.md](docx-js.md) 2. Create JavaScript/TypeScript file using Document, Paragraph, TextRun components 3. Export as .docx using `Packer.toBuffer()` ### Quick Example ```javascript const { Document, Packer, Paragraph, TextRun } = require('docx'); const fs = require('fs'); const doc = new Document({ sections: [{ children: [ new Paragraph({ children: [new TextRun({ text: "Hello World", bold: true })] }) ] }] }); Packer.toBuffer(doc).then(buffer => fs.writeFileSync("output.docx", buffer)); ``` ### Dependencies ```bash npm install -g docx # If not installed ``` --- ## Reading Document Content ### Text Extraction (via pandoc) ```bash # Convert to markdown with tracked changes preserved pandoc --track-changes=all document.docx -o output.md # Options: --track-changes=accept/reject/all ``` ### Raw XML Access For comments, complex formatting, metadata, or embedded media: ```bash # Unpack document python "$DOCX_SKILL/ooxml/scripts/unpack.py" document.docx unpacked/ # Key files: # - word/document.xml - Main content # - word/comments.xml - Comments # - word/media/ - Embedded images ``` --- ## Editing Documents (Redlining) Use the Python Document library for tracked changes and comments. ### Workflow 1. **MANDATORY**: Read the complete OOXML reference: [ooxml.md](ooxml.md) 2. Unpack the document: ```bash python "$DOCX_SKILL/ooxml/scripts/unpack.py" document.docx unpacked/ ``` 3. Create and run Python script (see Document Library below) 4. Pack the final document: ```bash python "$DOCX_SKILL/ooxml/scripts/pack.py" unpacked/ output.docx ``` ### Document Library Usage Run scripts with PYTHONPATH set: ```bash PYTHONPATH="$DOCX_SKILL" python your_script.py ``` ```python from scripts.document import Document, DocxXMLEditor # Initialize doc = Document('unpacked') # Find and replace with tracked changes node = doc["word/document.xml"].get_node(tag="w:r", contains="old text") rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else "" replacement = f'{rpr}old text{rpr}new text' doc["word/document.xml"].replace_node(node, replacement) # Add comment start = doc["word/document.xml"].get_node(tag="w:p", contains="target text") doc.add_comment(start=start, end=start, text="Comment text") # Save doc.save() ``` ### Key Principle: Minimal Edits Only mark text that actually changes. Keep unchanged text outside ``/`` tags: ```python # BAD - replaces entire sentence 'The term is 30 days.The term is 60 days.' # GOOD - only marks the changed number 'The term is 3060 days.' ``` --- ## Converting Documents to Images ```bash # Step 1: Convert DOCX to PDF soffice --headless --convert-to pdf document.docx # Step 2: Convert PDF pages to JPEG pdftoppm -jpeg -r 150 document.pdf page # Creates page-1.jpg, page-2.jpg, etc. ``` --- ## Dependencies | Tool | Install Command | Purpose | |------|-----------------|---------| | pandoc | `brew install pandoc` | Text extraction | | docx | `npm install -g docx` | Creating documents | | LibreOffice | `brew install --cask libreoffice` | PDF conversion | | Poppler | `brew install poppler` | PDF to images | | defusedxml | `pip install defusedxml` | Secure XML parsing | --- ## Additional Resources For complete API details and patterns, read the reference files: - **Creating documents**: [docx-js.md](docx-js.md) - **Editing documents**: [ooxml.md](ooxml.md)