# audio-transcription

> Transcribe audio and video files to text using Volcengine Doubao ASR Flash API. Use when converting audio recordings, podcasts, meetings, interviews, or videos to text. Supports local files (audio/video), URLs, and auto video-to-audio conversion via FFmpeg.

- Author: zijie
- Repository: runningZ1/audio-transcription-skill
- Version: 20251229114450
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/runningZ1/audio-transcription-skill
- Web: https://mule.run/skillshub/@@runningZ1/audio-transcription-skill~audio-transcription:20251229114450

---

---
name: audio-transcription
description: Transcribe audio and video files to text using Volcengine Doubao ASR Flash API. Use when converting audio recordings, podcasts, meetings, interviews, or videos to text. Supports local files (audio/video), URLs, and auto video-to-audio conversion via FFmpeg.
---

# Audio/Video Transcription - Volcengine Doubao ASR Flash

Transcribe audio and video files to text using the Volcengine large model ASR **Flash API** (synchronous mode).

## Quick Start

```python
import os
from scripts.transcribe import AudioTranscriber

transcriber = AudioTranscriber(
    appid=os.environ["VOLCENGINE_APP_ID"],
    token=os.environ["VOLCENGINE_ACCESS_TOKEN"]
)

# From URL
result = transcriber.transcribe(url="https://example.com/audio.mp3")

# From local audio file
result = transcriber.transcribe(file="./recording.mp3")

# From local video file (auto-converts to audio)
result = transcriber.transcribe(file="./video.mp4")

print(result["result"]["text"])
```

## CLI Usage

```bash
# Set credentials
export VOLCENGINE_APP_ID="your_app_id"
export VOLCENGINE_ACCESS_TOKEN="your_access_token"

# Transcribe from URL
python scripts/transcribe.py --url "https://example.com/audio.mp3"

# Transcribe from local audio file
python scripts/transcribe.py --file "./recording.mp3"

# Transcribe from local video file
python scripts/transcribe.py --file "./video.mp4"

# Text only output
python scripts/transcribe.py --file "./video.mp4" --text-only

# Save to file
python scripts/transcribe.py --file "./video.mp4" -o result.json
```

## Supported Formats

| Type | Formats |
|------|---------|
| **Audio** | mp3, wav, ogg, m4a, flac, aac |
| **Video** | mp4, mkv, avi, mov, wmv, flv, webm, ts, m4v |

## Features

| Feature | Description |
|---------|-------------|
| **Sync Mode** | Single request, immediate response |
| **URL Input** | Public accessible audio URL |
| **Local Audio** | Upload via Base64 encoding |
| **Local Video** | Auto-convert to audio via FFmpeg |
| **Temp Cleanup** | Auto-removes extracted audio files |

## Limits

| Item | Limit |
|------|-------|
| Audio duration | Max 2 hours |
| File size | Max 100MB |
| Upload stream | Recommended < 20MB |

## Requirements

```bash
# Python library
pip install requests

# FFmpeg (required for video files)
# Windows
winget install ffmpeg

# macOS
brew install ffmpeg

# Linux
apt install ffmpeg
```

## Workflow

```
Video File (MP4/MKV/AVI...)
        ↓
  FFmpeg Extract Audio
        ↓
Audio File (MP3/WAV...)  ←  Or direct audio input
        ↓
   ASR API Request
        ↓
   Transcription Text
```

## References

| Topic | File |
|-------|------|
| API Details | [reference/api.md](reference/api.md) |
| Response Format | [reference/response.md](reference/response.md) |
| Examples | [examples/](examples/) |
| Transcribe Script | [scripts/transcribe.py](scripts/transcribe.py) |

## Credentials

Get from [Volcengine Console](https://console.volcengine.com/speech/app):
- `VOLCENGINE_APP_ID`: Application ID
- `VOLCENGINE_ACCESS_TOKEN`: Access Token