# stt-integration > ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription. - Author: Angel Ryan - Repository: vanman2024/ai-dev-marketplace - Version: 20260117223534 - Stars: 2 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/vanman2024/ai-dev-marketplace - Web: https://mule.run/skillshub/@@vanman2024/ai-dev-marketplace~stt-integration:20260117223534 --- --- name: stt-integration description: ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription. allowed-tools: Bash, Read, Write, Edit --- # stt-integration This skill provides comprehensive guidance for implementing ElevenLabs Speech-to-Text (STT) capabilities using the Scribe v1 model, which supports 99 languages with state-of-the-art accuracy, speaker diarization for up to 32 speakers, and seamless Vercel AI SDK integration. ## Core Capabilities ### Scribe v1 Model Features - **Multi-language support**: 99 languages with varying accuracy levels - **Speaker diarization**: Up to 32 speakers with identification - **Word-level timestamps**: Precise synchronization for video/audio alignment - **Audio event detection**: Identifies sounds like laughter and applause - **High accuracy**: Optimized for accuracy over real-time processing ### Supported Formats - **Audio**: AAC, AIFF, OGG, MP3, Opus, WAV, WebM, FLAC, M4A - **Video**: MP4, AVI, Matroska, QuickTime, WMV, FLV, WebM, MPEG, 3GPP - **Limits**: Max 3 GB file size, 10 hours duration ## Skill Structure ### Scripts (scripts/) 1. **transcribe-audio.sh** - Direct API transcription with curl 2. **setup-vercel-ai.sh** - Install and configure @ai-sdk/elevenlabs 3. **test-stt.sh** - Test STT with sample audio files 4. **validate-audio.sh** - Validate audio file format and size 5. **batch-transcribe.sh** - Process multiple audio files ### Templates (templates/) 1. **stt-config.json.template** - STT configuration template 2. **vercel-ai-transcribe.ts.template** - Vercel AI SDK TypeScript template 3. **vercel-ai-transcribe.py.template** - Vercel AI SDK Python template 4. **api-transcribe.ts.template** - Direct API TypeScript template 5. **api-transcribe.py.template** - Direct API Python template 6. **diarization-config.json.template** - Speaker diarization configuration ### Examples (examples/) 1. **basic-stt/** - Basic STT with direct API 2. **vercel-ai-stt/** - Vercel AI SDK integration 3. **diarization/** - Speaker diarization examples 4. **multi-language/** - Multi-language transcription 5. **webhook-integration/** - Async transcription with webhooks ## Usage Instructions ### 1. Setup Vercel AI SDK Integration ```bash # Install dependencies bash scripts/setup-vercel-ai.sh # Verify installation npm list @ai-sdk/elevenlabs ``` ### 2. Basic Transcription ```bash # Transcribe a single audio file bash scripts/transcribe-audio.sh path/to/audio.mp3 en # Validate audio before transcription bash scripts/validate-audio.sh path/to/audio.mp3 # Batch transcribe multiple files bash scripts/batch-transcribe.sh path/to/audio/directory en ``` ### 3. Test STT Implementation ```bash # Run comprehensive tests bash scripts/test-stt.sh ``` ### 4. Use Templates ```typescript // Read Vercel AI SDK template Read: templates/vercel-ai-transcribe.ts.template // Customize for your use case // - Set language code // - Configure diarization // - Enable audio event tagging // - Set timestamp granularity ``` ### 5. Explore Examples ```bash # Basic STT example Read: examples/basic-stt/README.md # Vercel AI SDK example Read: examples/vercel-ai-stt/README.md # Speaker diarization example Read: examples/diarization/README.md ``` ## Language Support ### Excellent Accuracy (≤5% WER) 30 languages including: English, French, German, Spanish, Italian, Japanese, Portuguese, Dutch, Polish, Russian ### High Accuracy (>5-10% WER) 19 languages including: Bengali, Mandarin Chinese, Tamil, Telugu, Vietnamese, Turkish ### Good Accuracy (>10-25% WER) 30 languages including: Arabic, Korean, Thai, Indonesian, Hebrew, Czech ### Moderate Accuracy (>25-50% WER) 19 languages including: Amharic, Khmer, Lao, Burmese, Nepali ## Configuration Options ### Provider Options (Vercel AI SDK) - **languageCode**: ISO-639-1/3 code (e.g., 'en', 'es', 'ja') - **tagAudioEvents**: Enable sound detection (default: true) - **numSpeakers**: Max speakers 1-32 (default: auto-detect) - **diarize**: Enable speaker identification (default: true) - **timestampsGranularity**: 'none' | 'word' | 'character' (default: 'word') - **fileFormat**: 'pcm_s16le_16' | 'other' (default: 'other') ### Best Practices 1. **Specify language code** when known for better performance 2. **Use pcm_s16le_16** format for lowest latency with uncompressed audio 3. **Enable diarization** for multi-speaker content 4. **Set numSpeakers** for better accuracy when speaker count is known 5. **Use webhooks** for files >8 minutes for async processing ## Common Patterns ### Pattern 1: Simple Transcription Use direct API or Vercel AI SDK for single-language, single-speaker transcription. ### Pattern 2: Multi-Speaker Transcription Enable diarization and set numSpeakers for interviews, meetings, podcasts. ### Pattern 3: Multi-Language Support Detect language automatically or specify when known for content in 99 languages. ### Pattern 4: Video Transcription Extract audio from video formats and transcribe with timestamps for subtitles. ### Pattern 5: Webhook Integration Process long files asynchronously using webhook callbacks for results. ## Integration with Other ElevenLabs Skills - **tts-integration**: Combine STT → processing → TTS for voice translation workflows - **voice-cloning**: Transcribe existing voice samples before cloning - **dubbing**: Use STT as first step in dubbing pipeline ## Troubleshooting ### Audio Format Issues ```bash # Validate audio format bash scripts/validate-audio.sh your-audio.mp3 ``` ### Language Detection Problems - Specify languageCode explicitly instead of auto-detection - Ensure audio quality is sufficient for chosen language ### Diarization Not Working - Verify numSpeakers is set correctly (1-32) - Check that diarize: true is configured - Ensure audio has clear speaker separation ### File Size/Duration Limits - Max 3 GB file size - Max 10 hours duration - Files >8 minutes are chunked automatically ## Script Reference All scripts are located in `skills/stt-integration/scripts/`: 1. **transcribe-audio.sh** - Main transcription script with curl 2. **setup-vercel-ai.sh** - Install @ai-sdk/elevenlabs package 3. **test-stt.sh** - Comprehensive test suite 4. **validate-audio.sh** - Audio format and size validation 5. **batch-transcribe.sh** - Batch processing for multiple files ## Template Reference All templates are located in `skills/stt-integration/templates/`: 1. **stt-config.json.template** - JSON configuration 2. **vercel-ai-transcribe.ts.template** - TypeScript with Vercel AI SDK 3. **vercel-ai-transcribe.py.template** - Python with Vercel AI SDK 4. **api-transcribe.ts.template** - TypeScript with direct API 5. **api-transcribe.py.template** - Python with direct API 6. **diarization-config.json.template** - Diarization settings ## Example Reference All examples are located in `skills/stt-integration/examples/`: 1. **basic-stt/** - Basic transcription workflow 2. **vercel-ai-stt/** - Vercel AI SDK integration 3. **diarization/** - Speaker identification 4. **multi-language/** - Multi-language support 5. **webhook-integration/** - Async processing --- **Skill Location**: `plugins/elevenlabs/skills/stt-integration/` **Version**: 1.0.0 **Last Updated**: 2025-10-29