Skip to main content

Alibaba WAN 2.6 integration

WAN 2.6 is Alibaba’s latest video generation model, delivering smooth motion, high visual fidelity, and advanced features like multi-shot sequences and AI prompt expansion.
WAN 2.6 is a versatile AI video generation API that supports both image-to-video (i2v) and text-to-video (t2v) workflows. It generates cinematic videos with smooth motion, strong temporal consistency, and detailed visuals at 720p or 1080p resolution. The model offers durations of 5, 10, or 15 seconds and includes advanced features like prompt expansion and multi-shot composition for richer narratives.

Key capabilities

  • Dual input modes: Generate video from an image (i2v) or purely from text (t2v)
  • Resolution options: 720p (1280x720) and 1080p (1920x1080) in landscape or portrait
  • Flexible durations: 5, 10, or 15 second video outputs
  • Prompt expansion: AI optimizer expands short prompts into detailed scripts for richer output
  • Multi-shot sequences: Break prompts into multiple shots for narrative depth (requires prompt expansion)
  • Negative prompts: Exclude unwanted elements like watermarks, blur, or distortion
  • Reproducible results: Fixed seed support for consistent generation
  • Async processing: Webhook notifications or polling for task completion

Use cases

  • Marketing videos: Create product showcases and brand content from images or descriptions
  • Social media content: Generate short-form videos for TikTok, Instagram, and YouTube
  • Concept visualization: Transform static designs or ideas into motion
  • Storyboarding: Use multi-shot mode to prototype narrative sequences
  • Educational content: Illustrate concepts with AI-generated video explanations
  • Creative exploration: Experiment with text prompts to generate unique visual content

Generation modes

ModeInputBest for
Image-to-Video (i2v)Image URL + promptAnimating existing images, product videos, bringing artwork to life
Text-to-Video (t2v)Text prompt onlyCreating videos from scratch, concept exploration, narrative content

Resolution variants

VariantResolutionOrientationUse case
720p Landscape1280x720HorizontalWeb videos, presentations
720p Portrait720x1280VerticalSocial media stories, mobile
1080p Landscape1920x1080HorizontalHigh-quality marketing, YouTube
1080p Portrait1080x1920VerticalPremium social content, ads

API Operations

Image-to-Video (i2v)

Generate videos from an input image with motion guidance via prompts.

Text-to-Video (t2v)

Generate videos purely from text descriptions without an input image.

Parameters

ParameterTypeRequiredDefaultDescription
promptstringYes-Scene description, motion, camera moves, style. Max 2000 characters
imagestringi2v only-URL of the keyframe image to animate (JPEG, PNG, WebP)
sizestringNo1280*720Output size: 1280*720, 720*1280, 1920*1080, 1080*1920
durationstringNo5Video length: 5, 10, or 15 seconds
negative_promptstringNo-Elements to avoid (e.g., “blurry, watermark”). Max 1000 characters
enable_prompt_expansionbooleanNofalseEnable AI to expand prompts into detailed scripts
shot_typestringNosinglesingle for continuous shot, multi for scene transitions
seedintegerNo-1Random seed for reproducibility (-1 for random)
webhook_urlstringNo-URL for async status notifications

Frequently Asked Questions

WAN 2.6 is Alibaba’s video generation model that creates AI videos from images (i2v) or text prompts (t2v). It produces smooth, cinematic videos up to 15 seconds at 720p or 1080p resolution with features like multi-shot sequences and prompt expansion for richer narratives.
Image-to-video (i2v) animates an existing image you provide, giving you control over the visual starting point. Text-to-video (t2v) generates videos purely from text descriptions, creating visuals entirely from your prompt without an input image.
WAN 2.6 supports 5, 10, and 15 second durations. Shorter durations (5s) generate faster, while longer durations (15s) allow for more developed scenes and narratives.
WAN 2.6 offers 720p (1280x720) and 1080p (1920x1080) in both landscape and portrait orientations. Use 720p for faster generation and 1080p for higher quality output.
Prompt expansion uses AI to transform short, simple prompts into detailed scripts before generation. Enable it when you have a basic idea but want richer, more cinematic output. It is required for multi-shot mode.
Multi-shot mode (shot_type: multi) breaks your prompt into multiple scene transitions, creating a narrative sequence instead of a single continuous shot. It requires enable_prompt_expansion: true.
Generation time varies by resolution, duration, and server load. Typical processing ranges from 30 seconds to several minutes. Use webhooks for production workflows instead of polling.
Rate limits depend on your subscription tier. See the Rate Limits page for current limits by plan.
Pricing varies by resolution and duration. See the Pricing page for current rates and credit costs.
Image-to-video accepts JPEG, PNG, and WebP images via publicly accessible URLs. Use high-quality images with clear subjects for best results.

Best practices

  • Prompt writing: Be specific about scenes, camera movements (zoom, pan, tilt), lighting, and atmosphere for better results
  • Image selection (i2v): Use high-resolution images with clear subjects and balanced lighting; avoid compressed or noisy inputs
  • Negative prompts: Always include common artifacts to avoid: “blurry, low quality, watermark, text, distortion”
  • Duration selection: Start with 5 seconds for quick iterations, then increase to 10-15s for final outputs
  • Prompt expansion: Enable for short prompts or when you want the AI to add cinematic details
  • Multi-shot planning: For multi-shot mode, structure your prompt with clear scene descriptions
  • Production integration: Use webhooks for scalable applications instead of polling
  • Reproducibility: Save the seed value from successful generations to recreate similar results