Veo3 Video Generation

Beta

This model is currently in public testing. Not everyone has access, and API requests may also be unstable.

Overview

Generate high-fidelity, 8-second 720p or 1080p videos with stunning realism and natively generated audio using Google’s Veo 3.1 model.

Key Features

Text-to-Video: Generate videos from descriptive text prompts
Image-to-Video: Animate a starting image into a video sequence
Frame-specific generation: Generate videos by specifying first and last frames (interpolation)
Reference images: Use up to 3 reference images to guide content (Veo 3.1 only)
Audio generation: Natively generates synchronized audio with video

Supported Aspect Ratios and Resolutions

Aspect Ratio	Resolution	Duration Options
16:9	720p, 1080p	4s, 6s, 8s
9:16	720p, 1080p	4s, 6s, 8s

Note: 1080p resolution only supports 8-second duration.

Prompt Writing Tips

For best results, include these elements in your prompt:

Subject: The main focus (object, person, animal, scenery)
Action: What the subject is doing (walking, running, turning)
Style: Creative direction (sci-fi, horror film, film noir, cartoon)
Camera positioning (optional): aerial view, eye-level, dolly shot
Composition (optional): wide shot, close-up, single-shot
Ambiance (optional): blue tones, night, warm tones

Audio Prompting

Veo 3 can generate synchronized audio. Include audio cues in your prompt:

Dialogue: Use quotes for specific speech (e.g., “This must be the key,” he murmured)
Sound Effects: Explicitly describe sounds (e.g., tires screeching loudly)
Ambient Noise: Describe the environment’s soundscape (e.g., a faint, eerie hum)

Example Requests

Text-to-Video

{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "negative_prompt": "blurry, low quality, pixelated",
  "aspect_ratio": "16:9",
  "resolution": "1080p",
  "duration": "8"
}

Image-to-Video

{
  "prompt": "The character starts walking forward slowly",
  "image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...",
  "aspect_ratio": "9:16",
  "resolution": "720p",
  "duration": "6"
}

With Reference Images (Veo 3.1 only)

{
  "prompt": "A character walking in a city street at night",
  "reference_images": [
    "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...",
    "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..."
  ],
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "duration": "8"
}

Interpolation (First and Last Frame)

{
  "prompt": "A smooth transition between two scenes",
  "image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...",
  "last_frame": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...",
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "duration": "8"
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

prompt

string

required

Text description for the video. Supports audio cues.

Use descriptive language including:

Subject (object, person, animal, scenery)
Action (what the subject is doing)
Style (sci-fi, horror film, film noir, cartoon, etc.)
Camera positioning and motion (optional): aerial view, eye-level, dolly shot
Composition (optional): wide shot, close-up, single-shot
Ambiance (optional): blue tones, night, warm tones

Maximum string length: 2000

model

enum<string>

default:veo-3.1

Model name to use for generation.

veo-3.1: Veo 3.1 model

Available options:

veo-3.1,

veo-3.1-fast,

veo-3

negative_prompt

string

Text describing what not to include in the video.

Do not use instructive language like "no" or "don't". Instead, describe what you don't want to see (e.g., "wall, frame" instead of "No walls").

Maximum string length: 500

image

string | null

Initial image to animate. Can be a URL or Base64 encoded data.

Format for Base64: data:image/png;base64,{base64_data}

Supported formats: JPEG, JPG, PNG, BMP, WEBP Max file size: 10MB Image dimensions: [360, 2000] pixels for both width and height

last_frame

string | null

Final image for interpolation video. Must be used in combination with the image parameter.

Format for Base64: data:image/png;base64,{base64_data}

Supported formats: JPEG, JPG, PNG, BMP, WEBP Max file size: 10MB

reference_images

string[] | null

Up to 3 images to be used as style and content references.

Only supported by Veo 3.1

Each item can be a URL or Base64 encoded data. Format for Base64: data:image/png;base64,{base64_data}

Maximum array length: 3

aspect_ratio

enum<string>

default:16:9

Video aspect ratio (width:height).

Available for both 720p and 1080p resolutions.

Available options:

16:9,

9:16

resolution

enum<string>

default:720p

Video resolution.

Note: 1080p only supports 8-second duration.

Available options:

720p,

1080p

duration

enum<string>

default:8

Length of the generated video in seconds.

Note: Must be "8" when using interpolation (first and last frames) or reference images.

Available options:

4,

6,

8

Response

202 - application/json

Accepted - Task created successfully

task_info

object

Show child attributes

Using the APIs

API reference

MCP

LLM

Image

Video

MuleRun OpenAPI

Overview

Key Features

Supported Aspect Ratios and Resolutions

Prompt Writing Tips

Audio Prompting

Example Requests

Text-to-Video

Image-to-Video

With Reference Images (Veo 3.1 only)

Interpolation (First and Last Frame)

Authorizations

Body

Response

Using the APIs

API reference

MCP

LLM

Image

Video

MuleRun OpenAPI

​Overview

​Key Features

​Supported Aspect Ratios and Resolutions

​Prompt Writing Tips

​Audio Prompting

​Example Requests

​Text-to-Video

​Image-to-Video

​With Reference Images (Veo 3.1 only)

​Interpolation (First and Last Frame)

Authorizations

Body

Response

Overview

Key Features

Supported Aspect Ratios and Resolutions

Prompt Writing Tips

Audio Prompting

Example Requests

Text-to-Video

Image-to-Video

With Reference Images (Veo 3.1 only)

Interpolation (First and Last Frame)