Skip to main content
POST
/
vendors
/
google
/
v1
/
veo-3.1
/
generation
Create Generation Task
curl --request POST \
  --url https://api.mulerun.com/vendors/google/v1/veo-3.1/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "negative_prompt": "blurry, low quality, pixelated",
  "aspect_ratio": "16:9",
  "resolution": "1080p",
  "duration": 8
}
'
{
  "task_info": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "status": "pending",
    "created_at": "2023-11-07T05:31:56Z",
    "updated_at": "2023-11-07T05:31:56Z"
  }
}
Beta
This model is currently in public testing. Not everyone has access, and API requests may also be unstable.

Overview

Generate high-fidelity videos with stunning realism and natively generated audio using Google’s latest Veo 3.1 model. This is the most advanced version with support for reference images and frame interpolation.

Key Features

  • Text-to-Video: Generate videos from descriptive text prompts
  • Image-to-Video: Animate a starting image into a video sequence
  • Frame interpolation: Generate videos by specifying first and last frames
  • Reference images: Use up to 3 reference images to guide style and content
  • Audio generation: Natively generates synchronized audio with video
  • 4K support: Generate videos up to 4K resolution

Supported Configurations

Aspect RatioResolutionDuration OptionsNotes
16:9720p4s, 6s, 8sAll features supported
9:16720p4s, 6s, 8sAll features supported
16:91080p8s onlyReference images supported
9:161080p8s onlyReference images supported
16:94k8s onlyHigher latency and cost
9:164k8s onlyHigher latency and cost

Prompt Writing Tips

For best results, include these elements in your prompt:
  • Subject: The main focus (object, person, animal, scenery)
  • Action: What the subject is doing (walking, running, turning)
  • Style: Creative direction (sci-fi, horror film, film noir, cartoon)
  • Camera positioning (optional): aerial view, eye-level, dolly shot
  • Composition (optional): wide shot, close-up, single-shot
  • Ambiance (optional): blue tones, night, warm tones

Audio Prompting

Veo 3.1 can generate synchronized audio. Include audio cues in your prompt:
  • Dialogue: Use quotes for specific speech (e.g., β€œThis must be the key,” he murmured)
  • Sound Effects: Explicitly describe sounds (e.g., tires screeching loudly)
  • Ambient Noise: Describe the environment’s soundscape (e.g., a faint, eerie hum)

Example Requests

Text-to-Video

{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "negative_prompt": "blurry, low quality, pixelated",
  "aspect_ratio": "16:9",
  "resolution": "1080p",
  "duration": 8
}

Image-to-Video

{
  "prompt": "The character starts walking forward slowly",
  "image": "...",
  "aspect_ratio": "9:16",
  "resolution": "720p",
  "duration": 6
}

With Reference Images

{
  "prompt": "A character walking in a city street at night",
  "reference_images": [
    "...",
    "..."
  ],
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "duration": 8
}
When using reference images, duration must be set to 8 seconds.

Interpolation (First and Last Frame)

{
  "prompt": "A smooth transition between two scenes",
  "image": "...",
  "last_frame": "...",
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "duration": 8
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
prompt
string
required

Text description for the video. Supports audio cues.

Use descriptive language including:

  • Subject (object, person, animal, scenery)
  • Action (what the subject is doing)
  • Style (sci-fi, horror film, film noir, cartoon, etc.)
  • Camera positioning and motion (optional): aerial view, eye-level, dolly shot
  • Composition (optional): wide shot, close-up, single-shot
  • Ambiance (optional): blue tones, night, warm tones

Audio Prompting:

  • Dialogue: Use quotes for specific speech (e.g., "This must be the key," he murmured)
  • Sound Effects: Explicitly describe sounds (e.g., tires screeching loudly)
  • Ambient Noise: Describe the environment's soundscape (e.g., a faint, eerie hum)
Maximum string length: 2000
negative_prompt
string

Text describing what not to include in the video.

Do not use instructive language like "no" or "don't". Instead, describe what you don't want to see (e.g., "wall, frame" instead of "No walls").

Maximum string length: 500
image
string | null

Initial image to animate (first frame). Can be a URL or Base64 encoded data.

Format for Base64: data:image/png;base64,{base64_data}

Supported formats: JPEG, JPG, PNG, BMP, WEBP Max file size: 20MB

last_frame
string | null

Final image for interpolation video. Must be used in combination with the image parameter.

Format for Base64: data:image/png;base64,{base64_data}

Supported formats: JPEG, JPG, PNG, BMP, WEBP Max file size: 20MB

reference_images
string[] | null

Up to 3 images to be used as style and content references. Provide images of a person, character, or product to preserve the subject's appearance in the output video.

Each item can be a URL or Base64 encoded data. Format for Base64: data:image/png;base64,{base64_data}

Note: When using reference images, duration must be 8 seconds.

Maximum array length: 3
aspect_ratio
enum<string>
default:16:9

Video aspect ratio (width:height).

Available options:
16:9,
9:16
resolution
enum<string>
default:720p

Video resolution.

Note: 1080p and 4k only support 8 second duration.

Available options:
720p,
1080p,
4k
duration
enum<integer>
default:8

Length of the generated video in seconds.

Note: Must be 8 when using reference images, 1080p, or 4k resolution.

Available options:
4,
6,
8

Response

202 - application/json

Accepted - Task created successfully

task_info
object