Skip to main content

Kling 3 integration

Generate high-quality videos from text prompts or images using Kling’s latest V3 model with multi-shot support and advanced frame control.
Kling 3 is a dual-mode video generation API that creates professional-grade videos from either text descriptions or source images. It supports multi-shot mode for creating complex narratives with up to 6 scenes, first and end frame image control, and flexible durations from 3 to 15 seconds. Available in Pro and Standard tiers to balance quality and cost.

Key capabilities

  • Text-to-Video (T2V): Generate videos from text prompts up to 2500 characters
  • Image-to-Video (I2V): Use first_frame and/or end_frame images to control video start and end points
  • Multi-shot mode: Create videos with up to 6 scenes, each with custom prompts and durations (max 15 seconds total)
  • Flexible durations: 3-15 seconds with per-shot duration control in multi-shot mode
  • Element consistency: Pre-registered element IDs for consistent characters/styles across videos
  • CFG scale control: Adjust prompt adherence from 0 (creative) to 1 (strict), default 0.5
  • Negative prompts: Exclude unwanted elements, styles, or artifacts
  • Async processing: Webhook notifications or polling for task completion

Pro vs Standard

FeatureKling 3 ProKling 3 Standard
QualityHigher fidelity, richer detailGood quality, cost-effective
SpeedStandard processingFaster processing
Best forPremium content, marketingHigh-volume, testing

Use cases

  • Marketing and advertising: Create multi-scene product narratives with consistent branding
  • Social media content: Generate vertical videos for TikTok, Instagram Reels, and YouTube Shorts
  • E-commerce: Animate product images with controlled start and end frames
  • Storyboarding: Turn scripts into multi-shot video sequences
  • Creative storytelling: Build narratives with scene-by-scene control

Generate videos with Kling 3

Create videos by submitting a text prompt (T2V) or images with prompt (I2V) to the API. The service returns a task ID for async polling or webhook notification.

Parameters

ParameterTypeRequiredDefaultDescription
promptstringNo-Text prompt describing the video (max 2500 chars). Required for T2V.
negative_promptstringNo-Text describing what to avoid (max 2500 chars)
image_listarrayNo-Reference images with image_url and type (first_frame/end_frame)
multi_shotbooleanNofalseEnable multi-shot mode for multi-scene videos
shot_typestringNo-Use customize for custom shot definitions
multi_promptarrayNo-Shot definitions: index (0-5), prompt, duration (min 3s)
element_listarrayNo-Pre-registered element IDs for character/style consistency
aspect_ratiostringNo16:9Video ratio: 16:9, 9:16, 1:1
durationintegerNo5Duration in seconds: 3-15 (default 5)
cfg_scalenumberNo0.5Prompt adherence: 0 (creative) to 1 (strict)
webhook_urlstringNo-URL for task completion notification

Image list item

FieldTypeDescription
image_urlstringPublicly accessible image URL (300x300 min, 10MB max, JPG/JPEG/PNG)
typestringImage role: first_frame or end_frame

Multi-prompt item

FieldTypeDescription
indexintegerShot order (0-5)
promptstringText prompt for this shot (max 2500 chars)
durationnumberShot duration (minimum 3 seconds)

Frequently Asked Questions

Kling 3 is an AI video generation model that creates videos from text prompts (T2V) or images (I2V). You submit your request via the API, receive a task ID immediately, then poll for results or receive a webhook notification when processing completes. Typical generation takes 30-120 seconds depending on duration and complexity.
Multi-shot mode lets you create videos with up to 6 distinct scenes. Each scene can have its own prompt and duration. The total duration across all shots cannot exceed 15 seconds, and each shot must be at least 3 seconds. Enable with multi_shot: true and define scenes in multi_prompt.
Use the image_list parameter to provide reference images. Set type: "first_frame" to use an image as the video’s starting point, or type: "end_frame" for the ending point. You can use both to create a transition from one image to another.
Kling 3 accepts JPG, JPEG, and PNG images via publicly accessible URLs. Requirements: minimum 300x300 pixels, maximum 10MB file size, aspect ratio between 1:2.5 and 2.5:1.
CFG scale controls how closely the model follows your prompt. Use 0 for maximum creativity and artistic interpretation, 0.5 (default) for balanced results, or 1 for strict adherence to your prompt with less creative variation.
Pro delivers higher fidelity with richer detail, ideal for premium content and marketing. Standard offers good quality with faster processing, suitable for high-volume generation and testing. Both share the same parameters and capabilities.
Rate limits vary by subscription tier. See Rate Limits for current limits and quotas.
Pricing varies based on model tier (Pro vs Standard) and video duration. See the Pricing page for current rates.

Best practices

  • Prompt clarity: Write detailed prompts specifying subject, action, camera movement, and atmosphere
  • Start simple: Begin with single-shot mode before attempting multi-shot sequences
  • Image quality: For I2V, use high-resolution source images with clear subjects (min 300x300)
  • Duration planning: For multi-shot, plan scene durations to stay within 15-second total limit
  • Element consistency: Use pre-registered elements for recurring characters across multiple videos
  • CFG tuning: Start with 0.5, decrease for more creativity, increase for prompt precision
  • Production integration: Use webhooks instead of polling for scalable applications
  • Error handling: Implement retry logic with exponential backoff for 503 errors
  • Kling 3 Omni: Kling 3 with video reference support for motion/style guidance
  • Kling 2.6 Pro: Previous generation with motion control capabilities
  • Kling O1: High-performance video generation
  • Runway Gen 4.5: Alternative video generation model