Alibaba WAN 2.6 integration
WAN 2.6 is Alibaba’s latest video generation model, delivering smooth motion, high visual fidelity, and advanced features like multi-shot sequences and AI prompt expansion.
Key capabilities
- Dual input modes: Generate video from an image (i2v) or purely from text (t2v)
- Resolution options: 720p (1280x720) and 1080p (1920x1080) in landscape or portrait
- Flexible durations: 5, 10, or 15 second video outputs
- Prompt expansion: AI optimizer expands short prompts into detailed scripts for richer output
- Multi-shot sequences: Break prompts into multiple shots for narrative depth (requires prompt expansion)
- Negative prompts: Exclude unwanted elements like watermarks, blur, or distortion
- Reproducible results: Fixed seed support for consistent generation
- Async processing: Webhook notifications or polling for task completion
Use cases
- Marketing videos: Create product showcases and brand content from images or descriptions
- Social media content: Generate short-form videos for TikTok, Instagram, and YouTube
- Concept visualization: Transform static designs or ideas into motion
- Storyboarding: Use multi-shot mode to prototype narrative sequences
- Educational content: Illustrate concepts with AI-generated video explanations
- Creative exploration: Experiment with text prompts to generate unique visual content
Generation modes
| Mode | Input | Best for |
|---|---|---|
| Image-to-Video (i2v) | Image URL + prompt | Animating existing images, product videos, bringing artwork to life |
| Text-to-Video (t2v) | Text prompt only | Creating videos from scratch, concept exploration, narrative content |
Resolution variants
| Variant | Resolution | Orientation | Use case |
|---|---|---|---|
| 720p Landscape | 1280x720 | Horizontal | Web videos, presentations |
| 720p Portrait | 720x1280 | Vertical | Social media stories, mobile |
| 1080p Landscape | 1920x1080 | Horizontal | High-quality marketing, YouTube |
| 1080p Portrait | 1080x1920 | Vertical | Premium social content, ads |
API Operations
Image-to-Video (i2v)
Generate videos from an input image with motion guidance via prompts.Text-to-Video (t2v)
Generate videos purely from text descriptions without an input image.Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Scene description, motion, camera moves, style. Max 2000 characters |
image | string | i2v only | - | URL of the keyframe image to animate (JPEG, PNG, WebP) |
size | string | No | 1280*720 | Output size: 1280*720, 720*1280, 1920*1080, 1080*1920 |
duration | string | No | 5 | Video length: 5, 10, or 15 seconds |
negative_prompt | string | No | - | Elements to avoid (e.g., “blurry, watermark”). Max 1000 characters |
enable_prompt_expansion | boolean | No | false | Enable AI to expand prompts into detailed scripts |
shot_type | string | No | single | single for continuous shot, multi for scene transitions |
seed | integer | No | -1 | Random seed for reproducibility (-1 for random) |
webhook_url | string | No | - | URL for async status notifications |
Frequently Asked Questions
What is WAN 2.6 and what can it do?
What is WAN 2.6 and what can it do?
WAN 2.6 is Alibaba’s video generation model that creates AI videos from images (i2v) or text prompts (t2v). It produces smooth, cinematic videos up to 15 seconds at 720p or 1080p resolution with features like multi-shot sequences and prompt expansion for richer narratives.
What is the difference between image-to-video and text-to-video modes?
What is the difference between image-to-video and text-to-video modes?
Image-to-video (i2v) animates an existing image you provide, giving you control over the visual starting point. Text-to-video (t2v) generates videos purely from text descriptions, creating visuals entirely from your prompt without an input image.
What video durations does WAN 2.6 support?
What video durations does WAN 2.6 support?
WAN 2.6 supports 5, 10, and 15 second durations. Shorter durations (5s) generate faster, while longer durations (15s) allow for more developed scenes and narratives.
What resolutions are available?
What resolutions are available?
WAN 2.6 offers 720p (1280x720) and 1080p (1920x1080) in both landscape and portrait orientations. Use 720p for faster generation and 1080p for higher quality output.
What is prompt expansion and when should I use it?
What is prompt expansion and when should I use it?
Prompt expansion uses AI to transform short, simple prompts into detailed scripts before generation. Enable it when you have a basic idea but want richer, more cinematic output. It is required for multi-shot mode.
What is multi-shot mode?
What is multi-shot mode?
Multi-shot mode (
shot_type: multi) breaks your prompt into multiple scene transitions, creating a narrative sequence instead of a single continuous shot. It requires enable_prompt_expansion: true.How long does video generation take?
How long does video generation take?
Generation time varies by resolution, duration, and server load. Typical processing ranges from 30 seconds to several minutes. Use webhooks for production workflows instead of polling.
What are the rate limits for WAN 2.6?
What are the rate limits for WAN 2.6?
Rate limits depend on your subscription tier. See the Rate Limits page for current limits by plan.
How much does WAN 2.6 cost?
How much does WAN 2.6 cost?
Pricing varies by resolution and duration. See the Pricing page for current rates and credit costs.
What image formats are supported for i2v?
What image formats are supported for i2v?
Image-to-video accepts JPEG, PNG, and WebP images via publicly accessible URLs. Use high-quality images with clear subjects for best results.
Best practices
- Prompt writing: Be specific about scenes, camera movements (zoom, pan, tilt), lighting, and atmosphere for better results
- Image selection (i2v): Use high-resolution images with clear subjects and balanced lighting; avoid compressed or noisy inputs
- Negative prompts: Always include common artifacts to avoid: “blurry, low quality, watermark, text, distortion”
- Duration selection: Start with 5 seconds for quick iterations, then increase to 10-15s for final outputs
- Prompt expansion: Enable for short prompts or when you want the AI to add cinematic details
- Multi-shot planning: For multi-shot mode, structure your prompt with clear scene descriptions
- Production integration: Use webhooks for scalable applications instead of polling
- Reproducibility: Save the
seedvalue from successful generations to recreate similar results
Related APIs
- WAN 2.5: Previous WAN version with 480p, 720p, and 1080p options
- Kling 2.5 Turbo Pro: Alternative i2v model with cinematic quality
- Kling Pro v2.1: High-fidelity i2v with expressive motion
- PixVerse V5: Fast i2v with style consistency