# youtube-transcript-scraper > Scrape transcripts from all videos on a YouTube channel. Use when the user wants to download, extract, or scrape captions/subtitles/transcripts from YouTube channels, playlists, or multiple videos. Handles both manual and auto-generated captions with resume support for large channels. - Author: nicholas - Repository: nnnnicholas/youtube-transcript-scraper - Version: 20260128011255 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/nnnnicholas/youtube-transcript-scraper - Web: https://mule.run/skillshub/@@nnnnicholas/youtube-transcript-scraper~youtube-transcript-scraper:20260128011255 --- --- name: youtube-transcript-scraper description: Scrape transcripts from all videos on a YouTube channel. Use when the user wants to download, extract, or scrape captions/subtitles/transcripts from YouTube channels, playlists, or multiple videos. Handles both manual and auto-generated captions with resume support for large channels. --- # YouTube Transcript Scraper Scrapes all video transcripts from a YouTube channel using `yt-dlp`. ## Quick Start ```bash python scripts/scrape_channel.py --output-dir ./output ``` **Accepted inputs:** - Full URL: `https://www.youtube.com/@channelname` - Channel ID: `UCxxxxxxxxxxxxxxxxxx` - Handle: `@channelname` or just `channelname` ## Features - Downloads manual captions when available, falls back to auto-generated - Resumes from previous runs (skips already downloaded) - Rate limiting to avoid being blocked - Parallel downloads (default 3 workers) with backoff on rate limits - Skips YouTube Shorts and short videos (<4 minutes) - Optional batch mode (single yt-dlp run) for faster downloads - Structured JSON output with metadata - Progress logging ## Output Structure ``` output_dir/ ├── metadata.json # Channel info ├── index.json # Progress tracker with all video statuses └── transcripts/ ├── {video_id}.json # Each transcript with metadata └── ... ``` Each transcript file contains: ```json { "video_id": "abc123", "title": "Video Title", "url": "https://youtube.com/watch?v=abc123", "upload_date": "20240115", "duration": 600, "transcript": "Full transcript text...", "word_count": 1234, "scraped_at": "2024-01-15T10:30:00" } ``` ## Options | Flag | Description | Default | |------|-------------|---------| | `--output-dir`, `-o` | Output directory | `./transcripts` | | `--lang`, `-l` | Preferred language code | `en` | | `--delay`, `-d` | Seconds between requests | `1.0` | | `--max-workers` | Max concurrent workers | `3` | | `--mode` | Download mode: `per-video` or `batch` | `per-video` | | `--no-resume` | Re-download everything | (resume enabled) | **Note:** Shorts and short videos are skipped by default (detected by `/shorts/` URLs, `#shorts` titles, or duration < 4 minutes). ## Examples ```bash # Basic usage python scripts/scrape_channel.py https://www.youtube.com/@TechChannel # Spanish transcripts python scripts/scrape_channel.py @SpanishPodcast --lang es -o ./spanish_transcripts # Faster scraping (reduce delay) - use with caution python scripts/scrape_channel.py UCxxxxx --delay 0.5 # Batch mode (single yt-dlp run) python scripts/scrape_channel.py @ChannelHandle --mode batch ``` ## Programmatic Usage ```python from pathlib import Path from scripts.scrape_channel import scrape_channel result = scrape_channel( channel_url="https://www.youtube.com/@channelname", output_dir=Path("./output"), lang="en", resume=True, delay=1.0, max_workers=3, mode="per-video" ) print(f"Downloaded {result['results']['success']} transcripts") ``` ## Troubleshooting See [references/troubleshooting.md](references/troubleshooting.md) for common issues. **Common issues:** - `No videos found`: Check URL format, channel may be private - `No subtitles available`: Video has no captions (manual or auto) - Timeout errors: Increase `--delay`, try again later