# article-extractor > Extract clean article content from URLs (blog posts, articles, tutorials) and save as readable text. Use when user wants to download, extract, or save an article/blog post from a URL without ads, navigation, or clutter. - Author: Nigel Ferrer - Repository: nferrer-dev/claude-dotfiles - Version: 20260202165651 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/nferrer-dev/claude-dotfiles - Web: https://mule.run/skillshub/@@nferrer-dev/claude-dotfiles~article-extractor:20260202165651 --- --- name: article-extractor description: Extract clean article content from URLs (blog posts, articles, tutorials) and save as readable text. Use when user wants to download, extract, or save an article/blog post from a URL without ads, navigation, or clutter. allowed-tools: Bash,Write --- # Article Extractor Extracts main content from web articles, removing navigation, ads, and clutter. ## When to Use - User provides an article/blog URL and wants the text - User asks to "download this article" or "extract content from [URL]" - Need clean article text for analysis ## Tool Priority 1. **reader** (Mozilla Readability) — best all-around 2. **trafilatura** — best for blogs/news, non-English 3. **Fallback** — curl + basic HTML parsing ## Installation ```bash # Option 1 (recommended) npm install -g @mozilla/readability-cli # Option 2 pip3 install trafilatura ``` ## Workflow ```bash ARTICLE_URL="$1" # Detect tool if command -v reader &> /dev/null; then TOOL="reader" elif command -v trafilatura &> /dev/null; then TOOL="trafilatura" else TOOL="fallback" fi # Extract case $TOOL in reader) reader "$ARTICLE_URL" > temp_article.txt TITLE=$(head -n 1 temp_article.txt | sed 's/^# //') ;; trafilatura) METADATA=$(trafilatura --URL "$ARTICLE_URL" --json) TITLE=$(echo "$METADATA" | python3 -c "import json, sys; print(json.load(sys.stdin).get('title', 'Article'))") trafilatura --URL "$ARTICLE_URL" --output-format txt --no-comments > temp_article.txt ;; fallback) TITLE=$(curl -s "$ARTICLE_URL" | grep -oP '