# ppt-knowledge-doc > Turn PowerPoint files (.pptx/.ppt) into a structured knowledge document (Markdown + JSON). Use when you need to extract slide text, speaker notes, and images; OCR text inside screenshots/diagrams; and generate an executive summary, key points, glossary, and slide-by-slide notes. - Author: umshere - Repository: umshere/visual-knowledge-extractor - Version: 20260130130737 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/umshere/visual-knowledge-extractor - Web: https://mule.run/skillshub/@@umshere/visual-knowledge-extractor~ppt-knowledge-doc:20260130130737 --- --- name: ppt-knowledge-doc description: Turn PowerPoint files (.pptx/.ppt) into a structured knowledge document (Markdown + JSON). Use when you need to extract slide text, speaker notes, and images; OCR text inside screenshots/diagrams; and generate an executive summary, key points, glossary, and slide-by-slide notes. --- # ppt-knowledge-doc Convert a PowerPoint deck into a **knowledge document**. This skill is optimized for a repeatable, local CLI pipeline (no UI required): - Extract slide text + speaker notes - Extract embedded images - OCR images (diagrams/screenshots) - Produce: - `knowledge.json` (structured) - `knowledge.md` (download/share) ## Quick start (CLI) ```bash # From the repo root: cd skills/ppt-knowledge-doc/scripts python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt python ppt_kd_cli.py /path/to/deck.pptx --out /tmp/ppt-kd-out # outputs: /tmp/ppt-kd-out/knowledge.json and knowledge.md ``` ## What it generates - Executive summary - Key concepts / glossary (heuristic) - Processes / steps (best-effort from OCR text) - Slide-by-slide notes: - slide title - extracted text - speaker notes - OCR text from images ## Notes / Dependencies - OCR uses **Tesseract** if installed. - Optional conversions (for hard PPTs): **LibreOffice** (`soffice`) and **Poppler** (`pdftoppm`). See `references/system-capabilities.md`.