Skip to main content
POST
/
vendors
/
openai
/
v1
/
sora
/
generation
Sora Video Generation
curl --request POST \
  --url https://api.mulerun.com/vendors/openai/v1/sora/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "model": "sora-2",
  "seconds": "8",
  "size": "1280x720"
}
'
{
  "task_info": {
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "status": "pending",
    "created_at": "2025-09-21T00:00:00.000Z",
    "updated_at": "2025-09-21T00:00:00.000Z"
  }
}
Beta
This model is currently in public testing. Not everyone has access, and API requests may also be unstable.

Overview

Sora is OpenAI’s state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images. Built on years of research into multimodal diffusion and trained on diverse visual data, Sora brings a deep understanding of 3D space, motion, and scene continuity to text-to-video generation.

Models

Sora 2

sora-2 is designed for speed and flexibility. It’s ideal for the exploration phase when you’re experimenting with tone, structure, or visual style and need quick feedback rather than perfect fidelity. It generates good quality results quickly, making it well-suited for rapid iteration, concepting, and rough cuts. sora-2 is often more than sufficient for social media content, prototypes, and scenarios where turnaround time matters more than ultra-high fidelity.

Supported Resolutions

SizeAspect RatioUse Case
720x12809:16Vertical/Portrait (mobile, social media)
1280x72016:9Horizontal/Landscape (standard video)

Duration Options

Videos can be generated in three duration options:
  • 4 seconds: Quick clips
  • 8 seconds: Standard duration (default)
  • 12 seconds: Extended clips

Effective Prompting

For best results, describe shot type, subject, action, setting, and lighting. For example:
  • “Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.”
  • “Close-up of a steaming coffee cup on a wooden table, morning light through blinds, soft depth of field.”
This level of specificity helps the model produce consistent results without inventing unwanted details.

Content Restrictions

The API enforces several content restrictions:
  • Only content suitable for audiences under 18
  • Copyrighted characters and copyrighted music will be rejected
  • Real people—including public figures—cannot be generated
  • Input images with faces of humans are currently rejected

Example Requests

Text-to-Video

{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "model": "sora-2",
  "seconds": "8",
  "size": "1280x720"
}

Image-to-Video

{
  "prompt": "She turns around and smiles, then slowly walks out of the frame",
  "image": "...",
  "model": "sora-2",
  "seconds": "8",
  "size": "1280x720"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
prompt
string
required

Text description for the video. For best results, describe:

  • Shot type (wide shot, close-up, etc.)
  • Subject (what is the main focus)
  • Action (what is happening)
  • Setting (where the action takes place)
  • Lighting (time of day, mood)

Example: "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward."

Maximum string length: 2000
model
enum<string>
default:sora-2

Model name to use for generation.

  • sora-2: Faster, good quality, ideal for rapid iteration and social media content
Available options:
sora-2
image
string

Initial image to use as the first frame of the video. Can be a URL or Base64 encoded data.

Format for Base64: data:image/jpeg;base64,{base64_data}

Supported formats: image/jpeg, image/png, image/webp Image resolution must match the target video's size parameter Max file size: 10MB

seconds
enum<string>
default:8

Length of the generated video in seconds.

Available options:
4,
8,
12
size
enum<string>
default:1280x720

Video resolution (width x height).

Available options:
720x1280,
1280x720

Response

202 - application/json

Accepted - Task created successfully

task_info
object