# docx
> Comprehensive Word document creation, editing, and analysis. Use when working with .docx files: creating new documents, editing with tracked changes, adding comments, extracting text, or analyzing document structure.
- Author: Will Hunter
- Repository: willhunterrippling/rippling-os-2
- Version: 20260129103157
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/willhunterrippling/rippling-os-2
- Web: https://mule.run/skillshub/@@willhunterrippling/rippling-os-2~docx:20260129103157
---
---
name: docx
description: "Comprehensive Word document creation, editing, and analysis. Use when working with .docx files: creating new documents, editing with tracked changes, adding comments, extracting text, or analyzing document structure."
---
# DOCX Processing
This skill provides capabilities for creating, editing, and analyzing Word documents (.docx files).
## Setup
This skill is located at `~/.cursor/skills/docx`. Set the path before running commands:
```bash
DOCX_SKILL="$HOME/.cursor/skills/docx"
```
## Workflow Decision Tree
### Creating New Document
Use **docx-js** (JavaScript/TypeScript)
### Editing Existing Document
- **Simple changes to your own document**: Basic OOXML editing
- **Someone else's document**: Use **Redlining workflow** (recommended)
- **Legal, academic, business, government docs**: Use **Redlining workflow** (required)
### Reading/Analyzing Content
Use pandoc or raw XML access
---
## Creating New Documents
Use docx-js to create Word documents programmatically.
### Workflow
1. **MANDATORY**: Read the complete docx-js reference: [docx-js.md](docx-js.md)
2. Create JavaScript/TypeScript file using Document, Paragraph, TextRun components
3. Export as .docx using `Packer.toBuffer()`
### Quick Example
```javascript
const { Document, Packer, Paragraph, TextRun } = require('docx');
const fs = require('fs');
const doc = new Document({
sections: [{
children: [
new Paragraph({ children: [new TextRun({ text: "Hello World", bold: true })] })
]
}]
});
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("output.docx", buffer));
```
### Dependencies
```bash
npm install -g docx # If not installed
```
---
## Reading Document Content
### Text Extraction (via pandoc)
```bash
# Convert to markdown with tracked changes preserved
pandoc --track-changes=all document.docx -o output.md
# Options: --track-changes=accept/reject/all
```
### Raw XML Access
For comments, complex formatting, metadata, or embedded media:
```bash
# Unpack document
python "$DOCX_SKILL/ooxml/scripts/unpack.py" document.docx unpacked/
# Key files:
# - word/document.xml - Main content
# - word/comments.xml - Comments
# - word/media/ - Embedded images
```
---
## Editing Documents (Redlining)
Use the Python Document library for tracked changes and comments.
### Workflow
1. **MANDATORY**: Read the complete OOXML reference: [ooxml.md](ooxml.md)
2. Unpack the document:
```bash
python "$DOCX_SKILL/ooxml/scripts/unpack.py" document.docx unpacked/
```
3. Create and run Python script (see Document Library below)
4. Pack the final document:
```bash
python "$DOCX_SKILL/ooxml/scripts/pack.py" unpacked/ output.docx
```
### Document Library Usage
Run scripts with PYTHONPATH set:
```bash
PYTHONPATH="$DOCX_SKILL" python your_script.py
```
```python
from scripts.document import Document, DocxXMLEditor
# Initialize
doc = Document('unpacked')
# Find and replace with tracked changes
node = doc["word/document.xml"].get_node(tag="w:r", contains="old text")
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
replacement = f'{rpr}old text{rpr}new text'
doc["word/document.xml"].replace_node(node, replacement)
# Add comment
start = doc["word/document.xml"].get_node(tag="w:p", contains="target text")
doc.add_comment(start=start, end=start, text="Comment text")
# Save
doc.save()
```
### Key Principle: Minimal Edits
Only mark text that actually changes. Keep unchanged text outside ``/`` tags:
```python
# BAD - replaces entire sentence
'The term is 30 days.The term is 60 days.'
# GOOD - only marks the changed number
'The term is 3060 days.'
```
---
## Converting Documents to Images
```bash
# Step 1: Convert DOCX to PDF
soffice --headless --convert-to pdf document.docx
# Step 2: Convert PDF pages to JPEG
pdftoppm -jpeg -r 150 document.pdf page
# Creates page-1.jpg, page-2.jpg, etc.
```
---
## Dependencies
| Tool | Install Command | Purpose |
|------|-----------------|---------|
| pandoc | `brew install pandoc` | Text extraction |
| docx | `npm install -g docx` | Creating documents |
| LibreOffice | `brew install --cask libreoffice` | PDF conversion |
| Poppler | `brew install poppler` | PDF to images |
| defusedxml | `pip install defusedxml` | Secure XML parsing |
---
## Additional Resources
For complete API details and patterns, read the reference files:
- **Creating documents**: [docx-js.md](docx-js.md)
- **Editing documents**: [ooxml.md](ooxml.md)