Skip to main content
POST
/
vendors
/
alibaba
/
v1
/
wan2.6-t2v
/
generation
Create Generation Task
curl --request POST \
  --url https://api.mulerun.com/vendors/alibaba/v1/wan2.6-t2v/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "<string>",
  "negative_prompt": "<string>",
  "size": "1280*720",
  "prompt_extend": true,
  "shot_type": "single",
  "audio": true,
  "audio_url": "<string>",
  "seed": 1073741823
}
'
{
  "task_info": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "created_at": "2023-11-07T05:31:56Z",
    "updated_at": "2023-11-07T05:31:56Z"
  }
}

Documentation Index

Fetch the complete documentation index at: https://mulerun.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

This API supports Alibaba Tongyi Wanxiang (Wan2) video generation models. Please refer to Alibaba Cloud’s official documentation for more details.

Overview

Generate videos from text prompts using the wan2.6-t2v model with support for longer durations and multi-shot generation.

Key Features

  • Text-to-video generation with audio support
  • Multiple resolution options (720P/1080P)
  • 5s, 10s, or 15s duration
  • Single or multi-shot generation

Resolution Options

720P

  • 1280×720 (16:9)
  • 720×1280 (9:16)
  • 960×960 (1:1)
  • 1088×832 (4:3)
  • 832×1088 (3:4)

1080P

  • 1920×1080 (16:9)
  • 1080×1920 (9:16)
  • 1440×1440 (1:1)
  • 1632×1248 (4:3)
  • 1248×1632 (3:4)

Example Requests

Basic Text-to-Video

{
  "prompt": "Miyazaki-style mule dancing",
  "size": "1280*720",
  "duration": 5
}

Multi-shot Video

{
  "prompt": "Waves crashing against rocks, water splashing, sunlight on the sea",
  "size": "1920*1080",
  "duration": 10,
  "prompt_extend": true,
  "shot_type": "multi",
  "audio": true
}

With Custom Audio

{
  "prompt": "A person walking through a city at night",
  "size": "1280*720",
  "duration": 15,
  "audio_url": "https://example.com/city_sounds.mp3"
}

Parameters

shot_type

  • Default: single
  • Effect: When prompt rewriting is enabled, controls whether the output is single-shot or multi-shot
  • Only effective when: prompt_extend is enabled

duration

  • Options: 5, 10, or 15 seconds
  • Note: Longer durations may increase processing time

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
prompt
string
required

Text description for the desired video content (max 2000 characters).

Maximum string length: 2000
negative_prompt
string

Negative prompt describing unwanted content (max 500 characters).

Maximum string length: 500
size
enum<string>
default:1280*720

Output resolution ("width*height"). Supported tiers:

  • 720P: 1280*720 (16:9), 720*1280 (9:16), 960*960 (1:1), 1088*832 (4:3), 832*1088 (3:4)
  • 1080P: 1920*1080 (16:9), 1080*1920 (9:16), 1440*1440 (1:1), 1632*1248 (4:3), 1248*1632 (3:4)
Available options:
1280*720,
720*1280,
960*960,
1088*832,
832*1088,
1920*1080,
1080*1920,
1440*1440,
1632*1248,
1248*1632
duration
enum<integer>

Video duration in seconds. Supported values 5, 10, or 15.

Available options:
5,
10,
15
prompt_extend
boolean
default:true

Enable intelligent prompt rewriting (slightly longer latency, better detail).

shot_type
enum<string> | null
default:single

Specifies the shot type for video generation.

Only takes effect when prompt_extend is enabled.

  • single: Default value, outputs single-shot video
  • multi: Outputs multi-shot video
Available options:
single,
multi
audio
boolean | null
default:true

Enable automatic audio generation. Set to false to force a silent output.

audio_url
string<uri> | null

Custom audio file URL (wav/mp3, 3-30s, ≤15MB). Overrides the audio flag.

seed
integer

Random seed [0, 2147483647].

Required range: 0 <= x <= 2147483647

Response

202 - application/json

Accepted - Task created successfully

task_info
object