Skip to main content

Kling 3 Omni integration

Generate AI videos with multi-modal inputs: text prompts, reference images, and element consistency for characters and objects.
Kling 3 Omni is a versatile video generation API that supports multiple input modes: text-to-video, image-to-video, and reference-to-video with elements and images. It offers advanced features like multi-shot mode for scene-by-scene control and element consistency to maintain character/object identity across frames.
Looking for video-to-video? Use the dedicated Reference-to-Video endpoints to generate videos from a reference video using video_url.

Key capabilities

  • Text-to-video: Generate videos from text prompts up to 2500 characters
  • Image-to-video: Use image_url for start frame, end_image_url for end frame control
  • Element consistency: Pre-register characters/objects with elements and reference as @Element1, @Element2 in prompts
  • Reference images: Add style guidance with image_urls, reference as @Image1, @Image2 in prompts
  • Multi-shot mode: Create multi-scene videos with multi_prompt for shot-by-shot control
  • Duration control: Generate videos from 3-15 seconds
  • Audio options: Generate native audio or use voice IDs for narration
  • Async processing: Webhook notifications or polling for task completion

Generation modes

ModeParametersUse case
Text-to-videoprompt (required)Generate video from text description
Image-to-videoimage_url + promptAnimate a starting image
Reference-to-videoelements and/or image_urls + promptMaintain character/style consistency

Pro vs Standard

FeatureKling 3 Omni ProKling 3 Omni Standard
QualityHigher fidelity outputGood quality, cost-effective
SpeedStandard processingFaster processing
Best forPremium productionsTesting, high-volume

Use cases

  • Character animation: Maintain consistent character identity across video with elements
  • Product visualization: Animate product images with controlled motion
  • Storyboarding: Create multi-scene videos with shot-by-shot prompts
  • Style transfer: Apply visual style from reference images to generated content
  • Marketing content: Generate promotional videos from brand imagery

Generate videos with Kling 3 Omni

Create videos by submitting prompts with optional images and elements. The service returns a task ID for async polling or webhook notification.

Video-to-video with reference video

For video-to-video generation using a reference video, use the dedicated Reference-to-Video endpoints. These endpoints accept video_url and let you reference the video in your prompt as @Video1.

Parameters

ParameterTypeRequiredDefaultDescription
promptstringConditional-Text prompt (max 2500 chars). Required for text-to-video.
image_urlstringNo-Start frame image URL for image-to-video
start_image_urlstringNo-Alternative start frame image
end_image_urlstringNo-End frame image URL
image_urlsarrayNo-Reference images for style. Use @Image1, @Image2 in prompt
elementsarrayNo-Character/object elements. Use @Element1, @Element2 in prompt
multi_promptarrayNo-Shot-by-shot prompts for multi-scene videos (max 6 shots)
shot_typestringNocustomizeMulti-shot type (only customize supported)
aspect_ratiostringNo16:9Video ratio: 16:9, 9:16, 1:1
durationintegerNo5Duration in seconds: 3-15
generate_audiobooleanNo-Generate native audio for the video
voice_idsarrayNo-Voice IDs for narration. Use <<<voice_1>>> in prompt
webhook_urlstringNo-URL for task completion notification

Element definition

FieldTypeDescription
reference_image_urlsarrayReference image URLs for element. Multiple angles improve consistency
frontal_image_urlstringFrontal/primary reference image. Best with clear face/front view

Frequently Asked Questions

Kling 3 Omni is a multi-modal video generation API that creates videos from text prompts, images, or a combination. It supports element consistency (maintaining character/object identity), reference images for style guidance, and multi-shot mode for scene-by-scene control. Video durations range from 3-15 seconds.
For video-to-video generation (using a reference video), use the dedicated Reference-to-Video endpoints. These endpoints accept a video_url parameter and let you reference the video in your prompt as @Video1.
elements are for maintaining consistent identity of characters or objects across the video - use them for faces, products, or recurring subjects. image_urls are for general style/appearance reference. Both can be combined: elements for character consistency, images for style guidance.
Multi-shot mode lets you create videos with multiple scenes, each with its own prompt. Provide an array of prompts via multi_prompt (max 6 shots). Each shot must be at least 3 seconds. The total duration is the sum of all shots.
Images must be publicly accessible URLs in JPG, JPEG, or PNG format. Requirements: minimum 300x300 pixels, maximum 10MB file size.
Rate limits vary by subscription tier. See Rate Limits for current limits and quotas.
Pricing varies based on model tier (Pro vs Standard) and video duration. See the Pricing page for current rates.

Best practices

  • Element quality: Use clear, well-lit reference images for elements. Multiple angles improve consistency.
  • Prompt structure: Reference elements as @Element1 and images as @Image1 in your prompt for best results.
  • Duration planning: Start with 5-second videos to test, then increase duration for final output.
  • Multi-shot flow: Plan shot transitions carefully; each shot should have a coherent prompt.
  • Audio options: Use generate_audio: true for ambient sound, or voice_ids for narration.
  • Production integration: Use webhooks instead of polling for scalable applications.
  • Kling 3: Standard Kling 3 without Omni multi-modal features
  • Kling 2.6 Pro: Previous generation with motion control
  • Runway Gen 4.5: Alternative video generation model
  • VFX: Apply visual effects to generated videos