Skip to main content
POST
/
vendors
/
openai
/
v1
/
sora-2-pro
/
generation
Sora 2 Pro Video Generation
curl --request POST \
  --url https://api.mulerun.com/vendors/openai/v1/sora-2-pro/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "seconds": "8",
  "size": "1280x720"
}
'
{
  "task_info": {
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "status": "pending",
    "created_at": "2025-09-21T00:00:00.000Z",
    "updated_at": "2025-09-21T00:00:00.000Z"
  }
}
Beta
This model is currently in public testing. Not everyone has access, and API requests may also be unstable.

Overview

Sora 2 Pro is OpenAI’s premium video generation model, offering higher quality output compared to the standard Sora 2 model. It’s capable of creating richly detailed, dynamic clips from natural language prompts or image references.

Key Features

  • Higher quality video generation than Sora 2
  • Text-to-video and image-to-video generation
  • Support for more resolution options including portrait and landscape formats
  • Ideal for professional content creation

Supported Resolutions

SizeAspect RatioUse Case
720x12809:16Vertical/Portrait (mobile, social media)
1280x72016:9Horizontal/Landscape (standard video)
1024x17929:16Tall Portrait (extended vertical)
1792x102416:9Wide Landscape (cinematic)

Duration Options

Videos can be generated in three duration options:
  • 4 seconds: Quick clips (default)
  • 8 seconds: Standard duration
  • 12 seconds: Extended clips

Effective Prompting

For best results, describe shot type, subject, action, setting, and lighting. For example:
  • β€œWide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.”
  • β€œClose-up of a steaming coffee cup on a wooden table, morning light through blinds, soft depth of field.”
This level of specificity helps the model produce consistent results without inventing unwanted details.

Content Restrictions

The API enforces several content restrictions:
  • Only content suitable for audiences under 18
  • Copyrighted characters and copyrighted music will be rejected
  • Real peopleβ€”including public figuresβ€”cannot be generated
  • Input images with faces of humans are currently rejected

Example Requests

Text-to-Video

{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "seconds": "8",
  "size": "1280x720"
}

Image-to-Video (with Reference)

{
  "prompt": "She turns around and smiles, then slowly walks out of the frame",
  "input_reference": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA...",
  "seconds": "8",
  "size": "1280x720"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
prompt
string
required

Text description for the video. For best results, describe:

  • Shot type (wide shot, close-up, etc.)
  • Subject (what is the main focus)
  • Action (what is happening)
  • Setting (where the action takes place)
  • Lighting (time of day, mood)

Example: "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward."

Maximum string length: 2000
input_reference
string

Optional image reference that guides generation. Can be a URL or Base64 encoded data.

Format for Base64: data:image/jpeg;base64,{base64_data}

Supported formats: image/jpeg, image/png, image/webp Image resolution must match the target video's size parameter Max file size: 10MB

seconds
enum<string>
default:4

Clip duration in seconds.

Available options:
4,
8,
12
size
enum<string>
default:720x1280

Output resolution formatted as width x height.

Available options:
720x1280,
1280x720,
1024x1792,
1792x1024

Response

202 - application/json

Accepted - Task created successfully

task_info
object