# audio-to-video

> Generate video from audio + image using fal.ai LTX-2. Use for: talking head, lip sync, audio-driven video.

- Author: aviz85
- Repository: aviz85/ai-music-video-maker
- Version: 20260203113441
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/aviz85/ai-music-video-maker
- Web: https://mule.run/skillshub/@@aviz85/ai-music-video-maker~audio-to-video:20260203113441

---

---
name: audio-to-video
description: "Generate video from audio + image using fal.ai LTX-2. Use for: talking head, lip sync, audio-driven video."
allowed-tools: Bash, Read, Write
---

# Audio to Video

Generate video from audio + optional image using fal.ai LTX-2 19B.

## Usage

```bash
cd .claude/skills/audio-to-video/scripts
npx ts-node generate.ts \
  --audio "/path/to/speech.mp3" \
  --image "/path/to/face.png" \
  -d /tmp/output.mp4 \
  "A woman speaks to camera, natural lighting"
```

## Required Flags

| Flag | Description |
|------|-------------|
| `--audio`, `-a` | Audio file (mp3, wav, ogg, m4a, aac) |
| `-d`, `--destination` | Output video path |

## Optional Flags

| Flag | Default | Description |
|------|---------|-------------|
| `--image`, `-i` | - | Starting frame image |
| `--end-image` | - | Ending frame image |
| `--size`, `-s` | `landscape_16_9` | Video size |
| `--fps` | 25 | Frames per second |
| `--quality` | high | low, medium, high, maximum |
| `--camera` | none | dolly_in, dolly_out, jib_up, jib_down, static |
| `--no-match-length` | - | Don't auto-match video to audio duration |

## Video Sizes

`landscape_16_9`, `landscape_4_3`, `portrait_16_9`, `portrait_4_3`, `square_hd`, `square`, `auto`

## Limits

- **Max frames:** 481 frames
- **Max duration at 25fps:** ~19 seconds
- **Max duration at 24fps:** ~20 seconds

For longer videos, generate multiple clips and concatenate with ffmpeg.

## Pricing

~$0.001/megapixel. Example: 1280x720x121 frames = ~$0.11

## API Key

Uses `FAL_KEY` from `~/.claude/skills/image-generation/scripts/.env`