# html2md > Convert HTML files to Markdown format with intelligent preprocessing. Use when: (1) Converting single HTML file to Markdown, (2) Batch converting HTML files in a directory, (3) Processing saved web pages (SingleFile), (4) Converting documentation sites to Markdown. - Author: SunBo - Repository: sunbos/html2md-plugin - Version: 20260126035822 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/sunbos/html2md-plugin - Web: https://mule.run/skillshub/@@sunbos/html2md-plugin~html2md:20260126035822 --- --- name: html2md description: | Convert HTML files to Markdown format with intelligent preprocessing. Use when: (1) Converting single HTML file to Markdown, (2) Batch converting HTML files in a directory, (3) Processing saved web pages (SingleFile), (4) Converting documentation sites to Markdown. --- # HTML to Markdown Converter Production-grade HTML to Markdown converter with dual engine support. ## Quick Start ```bash # Define script path SCRIPT="${PLUGIN_DIR}/scripts/html2md.py" # Convert all HTML files in current directory python3 "$SCRIPT" . # Convert specific directory with output folder python3 "$SCRIPT" ./docs -o ./markdown # Recursive conversion python3 "$SCRIPT" ./website -r -o ./output # Force reconversion (ignore timestamps) python3 "$SCRIPT" ./docs -f # Dry run (preview only) python3 "$SCRIPT" ./docs --dry-run ``` ## Common Options | Option | Description | |--------|-------------| | `-o, --output DIR` | Output directory (default: same as input) | | `-r, --recursive` | Process subdirectories | | `-f, --force` | Force conversion even if output is newer | | `--engine {auto,markdownify,html2text}` | Conversion engine | | `--preset {default,compact,strict}` | Conversion preset | | `--aggressive` | Aggressive HTML cleaning (removes more elements) | | `--pattern GLOB` | File pattern (default: `*.html`) | | `--dry-run` | Preview without converting | | `-v, --verbose` | Verbose output | | `-q, --quiet` | Quiet mode (errors only) | | `-c, --config FILE` | Load settings from YAML config | ## Presets | Preset | Description | |--------|-------------| | `default` | Standard conversion with escape handling | | `compact` | Minimal escaping, single-line breaks | | `strict` | Maximum escaping for clean output | ## Dependencies **Required** (install at least one conversion engine): ### macOS (Homebrew Python 3.13) ```bash pip3.13 install markdownify html2text --break-system-packages # Optional pip3.13 install charset-normalizer tqdm pyyaml --break-system-packages ``` ### macOS (Xcode Python 3.9) ```bash xcrun python3 -m pip install markdownify html2text --user # Optional xcrun python3 -m pip install charset-normalizer tqdm pyyaml --user ``` ### Linux / Windows ```bash pip install markdownify html2text # Optional pip install charset-normalizer tqdm pyyaml ``` **Optional packages explanation:** | Package | Feature | |---------|---------| | `charset-normalizer` | Auto encoding detection (CJK support) | | `tqdm` | Progress bar for batch conversion | | `pyyaml` | YAML config file support | ## Usage Examples ### Single file conversion ```bash python3 "$SCRIPT" ./page.html # Creates ./page.md ``` ### Batch convert documentation site ```bash python3 "$SCRIPT" ./docs -r -o ./docs-md --preset compact ``` ### Convert SingleFile saved pages ```bash python3 "$SCRIPT" ~/Downloads --pattern "*.html" --aggressive ``` ### Use with config file ```bash python3 "$SCRIPT" -c config.yaml ./input ``` ## Config File Example (config.yaml) ```yaml engine: auto preset: default clean_html: true aggressive_clean: false add_title: true encoding: utf-8 ```