# youtube-research

> Research topics using YouTube video and podcast content. Use when user provides YouTube URLs and wants to (1) bring expert opinions into engineering planning discussions, (2) create structured podcast summaries, or (3) extract specific topics from longer podcasts. Handles transcript fetching, cleaning, and analysis.

- Author: gfreeau
- Repository: gfreeau/claude-skills
- Version: 20251229134941
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/gfreeau/claude-skills
- Web: https://mule.run/skillshub/@@gfreeau/claude-skills~youtube-research:20251229134941

---

---
name: youtube-research
description: Research topics using YouTube video and podcast content. Use when user provides YouTube URLs and wants to (1) bring expert opinions into engineering planning discussions, (2) create structured podcast summaries, or (3) extract specific topics from longer podcasts. Handles transcript fetching, cleaning, and analysis.
allowed-tools: Bash(yt-dlp:*), Read, Write, Glob, Grep
---

# YouTube Research

Fetch, clean, and analyze YouTube video/podcast transcripts for engineering planning and knowledge extraction.

## When This Skill Applies

- User provides YouTube URL(s) with a question or request
- User wants expert opinions from talks/podcasts to inform engineering decisions
- User asks for podcast summaries or topic extraction
- User references "that video" or "that podcast" with a URL

## Two Modes of Operation

### Mode 1: Conversational Research

**Trigger**: User asks a question alongside YouTube URL(s)

Examples:
- "What does this talk say about retry logic? [url]"
- "I'm designing auth - here are two talks on the topic [url1] [url2]. What patterns do they recommend?"

**Behavior**:
1. Fetch and clean transcript(s) to temp directory
2. Read the content into context
3. Answer the user's question directly
4. Support follow-up questions naturally

The transcript becomes context for the conversation, not the output.

### Mode 2: Structured Summarization

**Trigger**: User asks to summarize or extract from a podcast

Examples:
- "Summarize this podcast"
- "Extract the section about US market entry from this video"
- "Create a summary of what they discuss about hiring"

**Behavior**:
1. Fetch and clean transcript to temp directory
2. Follow the methodology in `METHODOLOGY.md`
3. Produce a structured markdown document
4. Save summary to current working directory

---

## Working Directory Structure

Use `/tmp/yt-transcripts/` for intermediate files to avoid littering the user's directory.

```bash
mkdir -p /tmp/yt-transcripts
```

| File Type | Location |
|-----------|----------|
| Raw VTT | `/tmp/yt-transcripts/VIDEO_ID.en.vtt` |
| Cleaned transcript | `/tmp/yt-transcripts/VIDEO_ID_clean.txt` |
| Final summary | `./VIDEO_ID_title-slug_summary.md` |

---

## Transcript Fetching Process

### Step 1: Download VTT subtitles

```bash
mkdir -p /tmp/yt-transcripts
yt-dlp --write-auto-sub --sub-lang en --sub-format vtt --skip-download \
  -o "/tmp/yt-transcripts/VIDEO_ID" "YOUTUBE_URL"
```

This creates `/tmp/yt-transcripts/VIDEO_ID.en.vtt`.

If `en` fails, try `en-orig` or check available subtitles:
```bash
yt-dlp --list-subs "YOUTUBE_URL"
```

### Step 2: Clean the VTT file

Remove timestamps, metadata, HTML tags, and duplicate lines:

```bash
grep -v -E '^WEBVTT|^Kind:|^Language:|^$|^[0-9]{2}:[0-9]{2}:[0-9]{2}|-->' \
  /tmp/yt-transcripts/VIDEO_ID.en.vtt \
  | sed 's/<[^>]*>//g' \
  | awk '!seen[$0]++' \
  > /tmp/yt-transcripts/VIDEO_ID_clean.txt
```

### Step 3: Read the cleaned transcript

Read the full transcript to understand:
- Who is speaking (identify from intro/context)
- Structure of the conversation
- Key topics covered

---

## Troubleshooting

**yt-dlp errors**: The tool updates frequently to keep up with YouTube changes. If downloads fail, first try upgrading:
```bash
pipx upgrade yt-dlp
```

**No subtitles found**:
- Video may have captions disabled by the uploader
- Try `--sub-lang en-orig` for original English
- Some videos only have auto-generated captions in other languages

**Poor transcript quality**:
- Auto-captions struggle with accents, technical jargon, and multiple speakers
- May need manual correction for accuracy-critical work

---

## Output Naming Convention

Extract VIDEO_ID from YouTube URL (the part after `v=`) and fetch the video title:

```bash
yt-dlp --print title "YOUTUBE_URL"
```

Slugify the title (lowercase, hyphens for spaces, remove special characters).

- Full summary: `./VIDEO_ID_title-slug_summary.md`
- Topic-specific: `./VIDEO_ID_title-slug_topic.md`

Examples:
- `rmvDxxNubIg_building-agents-at-scale_summary.md`
- `1wxBY7m6zfI_startup-growth-tactics_us-market-entry.md`