# transcriber

> AI-powered audio and video transcription using OpenAI Whisper or AssemblyAI. Use when converting recordings to text, generating subtitles, or creating searchable transcripts.

- Author: Kessa
- Repository: CoachSteff/superskills
- Version: 20260127050145
- Stars: 1
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/CoachSteff/superskills
- Web: https://mule.run/skillshub/@@CoachSteff/superskills~transcriber:20260127050145

---

---
name: transcriber
description: AI-powered audio and video transcription using OpenAI Whisper or AssemblyAI. Use when converting recordings to text, generating subtitles, or creating searchable transcripts.
version: 1.0.0
---

# Transcriber

> **Note**: Review [PROFILE.md](PROFILE.md) for user-specific transcription preferences, provider settings, and formatting options.
> 
> **Master Briefing**: Global brand voice at `~/.superskills/master-briefing.yaml` applies automatically. Skill profile overrides when conflicts exist.

Convert audio and video to accurate text transcripts with timestamps, perfect for creating course materials, show notes, and searchable content.

## Tools

**Transcriber.py** (in src/):
- Multi-provider support (OpenAI Whisper, AssemblyAI)
- Word-level timestamps for precise timing
- Multiple output formats (TXT, JSON, SRT, VTT)
- Batch processing for multiple files
- Key quote extraction for marketing
- Automatic language detection
- Confidence scoring

## Core Workflow

### 1. File Preparation
- Receive audio/video file(s)
- Validate file format and size
- Determine output requirements (format, timestamps)
- Select transcription provider

### 2. Transcription
- Upload and process file through API
- Capture word-level timestamps if needed
- Detect language automatically
- Extract metadata (duration, word count)
- Generate confidence scores

### 3. Delivery
- Export in requested format (TXT, JSON, SRT, VTT)
- Extract key quotes if needed
- Package with metadata
- Handoff to downstream workflows (narrator, coursepackager, author)

## Usage

**Basic Transcription:**
```python
from superskills.transcriber.src import transcribe_file

result = transcribe_file("recording.mp3")
print(result.transcript)
print(f"Duration: {result.duration_seconds}s")
print(f"Words: {result.word_count}")
```

**With Timestamps:**
```python
from superskills.transcriber.src import Transcriber

transcriber = Transcriber(provider="openai")
result = transcriber.transcribe(
    "session.mp4",
    include_timestamps=True,
    output_format="srt"
)
print(f"Saved to: {result.output_file}")
```

**Batch Processing:**
```python
files = ["session1.mp3", "session2.mp3", "session3.mp3"]
results = transcriber.transcribe_batch(files, output_format="json")

for result in results:
    print(f"{result.source_file}: {result.word_count} words")
```

**Extract Marketing Quotes:**
```python
result = transcriber.transcribe("podcast.mp3", include_timestamps=True)
quotes = transcriber.extract_key_quotes(result, min_words=15, max_quotes=5)

for quote in quotes:
    print(quote)
```

## Output Formats

**TXT**: Plain text transcript  
**JSON**: Full metadata including timestamps and confidence  
**SRT**: Standard subtitle format for video players  
**VTT**: WebVTT format for web video  

## Environment Variables

```bash
# OpenAI Whisper (recommended)
OPENAI_API_KEY=your_openai_api_key

# Or AssemblyAI (alternative)
ASSEMBLYAI_API_KEY=your_assemblyai_api_key
```

**Global .env (repository root):**
```bash
echo "OPENAI_API_KEY=sk-your-key" >> .env
```

**Or skill-specific .env:**
```bash
echo "OPENAI_API_KEY=sk-your-key" >> superskills/transcriber/.env
```

## Quality Checklist

- [ ] Audio quality sufficient (clear speech, minimal background noise)
- [ ] File size within limits (OpenAI: <25MB)
- [ ] Language correctly detected or specified
- [ ] Timestamps accurate if requested
- [ ] Confidence scores reviewed
- [ ] Output format correct for use case
- [ ] Metadata captured (duration, word count)

## Avoid

- **Poor Audio Quality**: Accept any file → Pre-check audio quality and clarity
- **Missing Context**: Generic transcription → Specify language/domain for better accuracy
- **Wrong Format**: Text only → Use SRT/VTT for video subtitles
- **Ignoring Timestamps**: Plain text → Capture timestamps for editing/navigation
- **Large Files**: Single upload → Split files >25MB or use AssemblyAI

## Escalate When

- Audio quality too poor for accurate transcription
- Multiple speakers need identification (diarization)
- Technical jargon requires custom vocabulary
- File size exceeds API limits
- Budget constraints require cost optimization

## Integration Examples

**With Narrator (Podcast Transcripts):**
```python
from superskills.narrator.src import PodcastGenerator
from superskills.transcriber.src import Transcriber

podcast = PodcastGenerator()
result = podcast.generate_podcast(segments, "episode.mp3")

transcriber = Transcriber()
transcript = transcriber.transcribe("episode.mp3", output_format="txt")
print(transcript.transcript)
```

**With Marketer (Social Snippets):**
```python
from superskills.transcriber.src import Transcriber
from superskills.marketer.src import SocialMediaPublisher

transcriber = Transcriber()
result = transcriber.transcribe("training.mp4", include_timestamps=True)

quotes = transcriber.extract_key_quotes(result, min_words=15, max_quotes=3)

publisher = SocialMediaPublisher()
for quote in quotes:
    publisher.schedule_post(quote, platforms=["TWITTER", "LINKEDIN"])
```

**With CoursePackager (Searchable Transcripts):**
```python
from superskills.transcriber.src import Transcriber

transcriber = Transcriber()
results = transcriber.transcribe_batch(
    ["lesson1.mp4", "lesson2.mp4", "lesson3.mp4"],
    output_format="json"
)
```