Skip to main content
POST
/
vendors
/
klingai
/
v1
/
kling-v2.6
/
image-to-video
/
generation
Image to Video Generation
curl --request POST \
  --url https://api.mulerun.com/vendors/klingai/v1/kling-v2.6/image-to-video/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data @- <<EOF
{
  "mode": "pro",
  "duration": 5,
  "image": "https://example.com/image.jpg",
  "prompt": "The man <<<voice_1>>> said, 'Hello, welcome to the show.'",
  "sound": "on",
  "voice_list": [
    {
      "voice_id": "voice_id_1"
    }
  ]
}
EOF
{
  "task_info": {
    "id": "8e1e315e-b50d-4334-a231-be7d19a372f4",
    "status": "pending",
    "created_at": "2025-09-21T00:00:00.000Z",
    "updated_at": "2025-09-21T00:00:00.000Z"
  }
}
This API supports Kling v2.6 video generation model with audio and voice generation. Please refer to Kling’s official documentation for more details.

Overview

Generate videos from images using the Kling v2.6 model with built-in audio and custom voice generation support.

Key Features

  • Image-to-video generation with audio
  • Standard and Professional quality modes
  • 5s or 10s duration
  • End frame control (image_tail)
  • Motion brush support (static_mask, dynamic_masks)
  • Audio generation support (new in v2.6)
  • Custom voice generation (new in v2.6)

Image Requirements

PropertyRequirement
FormatsJPEG, JPG, PNG
DimensionsMin 300px for both width and height
Aspect RatioBetween 1:2.5 and 2.5:1
File SizeMax 10MB
InputPublic URL or Base64 encoded data

Audio & Voice Generation

sound Parameter

  • on: Generate video with synchronized audio
  • off: Generate silent video (default)

voice_list Parameter

Reference custom voices in video generation:
  • Up to 2 voices per task
  • Use <<<voice_1>>> syntax in prompt to specify voice
  • Requires sound: "on"

Example Requests

Basic Image-to-Video with Audio

{
  "image": "https://example.com/landscape.jpg",
  "prompt": "Birds chirping as camera pans across the landscape",
  "mode": "std",
  "duration": 5,
  "sound": "on"
}

With Custom Voice

{
  "image": "https://example.com/person.jpg",
  "prompt": "The man <<<voice_1>>> said, 'Hello, welcome to our channel.'",
  "mode": "pro",
  "duration": 5,
  "sound": "on",
  "voice_list": [
    {"voice_id": "your_custom_voice_id"}
  ]
}

With Multiple Voices

{
  "image": "https://example.com/two_people.jpg",
  "prompt": "Person A <<<voice_1>>> says 'Good morning!' and Person B <<<voice_2>>> replies 'Good morning to you too!'",
  "mode": "pro",
  "duration": 10,
  "sound": "on",
  "voice_list": [
    {"voice_id": "voice_id_1"},
    {"voice_id": "voice_id_2"}
  ]
}

With Dynamic Masks

{
  "image": "https://example.com/scene.jpg",
  "prompt": "Object moves along the path with ambient sounds",
  "static_mask": "https://example.com/static_mask.png",
  "dynamic_masks": [
    {
      "mask": "https://example.com/dynamic_mask.png",
      "trajectories": [
        {"x": 100, "y": 200},
        {"x": 300, "y": 400}
      ]
    }
  ],
  "mode": "std",
  "duration": 5,
  "sound": "on"
}

Parameters

sound

  • Options: on, off
  • Default: off
  • Note: Must be on when using voice_list

voice_list

  • Optional: Yes
  • Max Items: 2
  • Description: Custom voice IDs for voice generation
  • Note: Use voice IDs from the custom voice API, NOT Lip-Sync API

Prompt Voice Syntax

Use <<<voice_N>>> in your prompt to specify which voice speaks:
  • <<<voice_1>>> - First voice in voice_list
  • <<<voice_2>>> - Second voice in voice_list (if provided)
Example: The man <<<voice_1>>> said, "Hello."

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
image
string
required

Reference Image. Support inputting image Base64 encoding or image URL.

Important: When using Base64 encoding, do not add any prefixes such as data:image/png;base64,. Provide only the Base64-encoded string itself.

  • Supported image formats: .jpg, .jpeg, .png
  • Image file size cannot exceed 10MB
  • Width and height dimensions must not be less than 300px
  • Aspect ratio should be between 1:2.5 ~ 2.5:1
image_tail
string | null

Reference Image - End frame control. Support inputting image Base64 encoding or image URL.

  • At least one parameter should be filled in between image and image_tail
  • image+image_tail and dynamic_masks/static_mask cannot be used at the same time
prompt
string

Positive text prompt. Cannot exceed 2500 characters.

Use <<<voice_1>>> to specify the voice, matching the sequence in voice_list. Example: The man <<<voice_1>>> said, "Hello."

Maximum string length: 2500
negative_prompt
string

Negative text prompt. Cannot exceed 2500 characters.

Maximum string length: 2500
voice_list
object[] | null

List of voices referenced when generating videos.

  • A video generation task can reference up to 2 voices
  • When voice_list is not empty and prompt references the voice ID, billing is based on "with voice generation"
Maximum array length: 2
sound
enum<string>
default:off

Generate audio simultaneously when generating videos.

  • on: Enable audio generation (required when using voice_list)
  • off: Disable audio generation (silent video)
Available options:
on,
off
mode
enum<string>
default:std

Video generation mode

std: Standard Mode, which is cost-effective. pro: Professional Mode, generates videos use longer duration but higher quality video output.

Available options:
std,
pro
static_mask
string | null

Static Brush Application Area (Mask image created by users using the motion brush).

  • Support inputting image Base64 encoding or image URL
  • The aspect ratio of the mask image must match the input image
dynamic_masks
object[] | null

Dynamic Brush Configuration List. Multiple configurations can be set up (up to 6 groups).

Maximum array length: 6
duration
enum<integer>
default:5

Video Length in seconds

Available options:
5,
10

Response

202 - application/json

Accepted - Task created successfully

task_info
object