# html2md
> Convert HTML files to Markdown format with intelligent preprocessing.
Use when: (1) Converting single HTML file to Markdown, (2) Batch converting
HTML files in a directory, (3) Processing saved web pages (SingleFile),
(4) Converting documentation sites to Markdown.
- Author: SunBo
- Repository: sunbos/html2md-plugin
- Version: 20260126035822
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/sunbos/html2md-plugin
- Web: https://mule.run/skillshub/@@sunbos/html2md-plugin~html2md:20260126035822
---
---
name: html2md
description: |
Convert HTML files to Markdown format with intelligent preprocessing.
Use when: (1) Converting single HTML file to Markdown, (2) Batch converting
HTML files in a directory, (3) Processing saved web pages (SingleFile),
(4) Converting documentation sites to Markdown.
---
# HTML to Markdown Converter
Production-grade HTML to Markdown converter with dual engine support.
## Quick Start
```bash
# Define script path
SCRIPT="${PLUGIN_DIR}/scripts/html2md.py"
# Convert all HTML files in current directory
python3 "$SCRIPT" .
# Convert specific directory with output folder
python3 "$SCRIPT" ./docs -o ./markdown
# Recursive conversion
python3 "$SCRIPT" ./website -r -o ./output
# Force reconversion (ignore timestamps)
python3 "$SCRIPT" ./docs -f
# Dry run (preview only)
python3 "$SCRIPT" ./docs --dry-run
```
## Common Options
| Option | Description |
|--------|-------------|
| `-o, --output DIR` | Output directory (default: same as input) |
| `-r, --recursive` | Process subdirectories |
| `-f, --force` | Force conversion even if output is newer |
| `--engine {auto,markdownify,html2text}` | Conversion engine |
| `--preset {default,compact,strict}` | Conversion preset |
| `--aggressive` | Aggressive HTML cleaning (removes more elements) |
| `--pattern GLOB` | File pattern (default: `*.html`) |
| `--dry-run` | Preview without converting |
| `-v, --verbose` | Verbose output |
| `-q, --quiet` | Quiet mode (errors only) |
| `-c, --config FILE` | Load settings from YAML config |
## Presets
| Preset | Description |
|--------|-------------|
| `default` | Standard conversion with escape handling |
| `compact` | Minimal escaping, single-line breaks |
| `strict` | Maximum escaping for clean output |
## Dependencies
**Required** (install at least one conversion engine):
### macOS (Homebrew Python 3.13)
```bash
pip3.13 install markdownify html2text --break-system-packages
# Optional
pip3.13 install charset-normalizer tqdm pyyaml --break-system-packages
```
### macOS (Xcode Python 3.9)
```bash
xcrun python3 -m pip install markdownify html2text --user
# Optional
xcrun python3 -m pip install charset-normalizer tqdm pyyaml --user
```
### Linux / Windows
```bash
pip install markdownify html2text
# Optional
pip install charset-normalizer tqdm pyyaml
```
**Optional packages explanation:**
| Package | Feature |
|---------|---------|
| `charset-normalizer` | Auto encoding detection (CJK support) |
| `tqdm` | Progress bar for batch conversion |
| `pyyaml` | YAML config file support |
## Usage Examples
### Single file conversion
```bash
python3 "$SCRIPT" ./page.html
# Creates ./page.md
```
### Batch convert documentation site
```bash
python3 "$SCRIPT" ./docs -r -o ./docs-md --preset compact
```
### Convert SingleFile saved pages
```bash
python3 "$SCRIPT" ~/Downloads --pattern "*.html" --aggressive
```
### Use with config file
```bash
python3 "$SCRIPT" -c config.yaml ./input
```
## Config File Example (config.yaml)
```yaml
engine: auto
preset: default
clean_html: true
aggressive_clean: false
add_title: true
encoding: utf-8
```