Skip to main content

Alibaba WAN Integration

WAN 2.5 is developed by Alibaba and delivers high-quality text-to-video generation with multiple resolution options and prompt expansion capabilities.
WAN 2.5 Text-to-Video generates videos directly from text descriptions. Describe your scene, characters, motion, and camera movements in a prompt, and the API produces video output at your chosen resolution. Available in 480p, 720p, and 1080p variants with 5 or 10 second durations, suitable for rapid prototyping, social media content, and marketing assets.

Key capabilities

  • Multiple resolutions: Choose between 480p, 720p, or 1080p based on quality and speed requirements
  • Flexible duration: Generate 5-second clips for fast iteration or 10-second videos for more developed action
  • Prompt expansion: AI-powered prompt optimizer expands simple ideas into detailed video scripts
  • Negative prompts: Exclude unwanted elements like blur, watermarks, or distortion from output
  • Reproducible results: Use seed values to regenerate similar videos with identical parameters
  • Async processing: Webhook notifications or polling for task completion
  • Maximum prompt length: 800 characters for main prompt, 500 characters for negative prompt

Resolution comparison

ResolutionBest forProcessing speedOutput quality
480pRapid prototyping, previews, mobile-first contentFastestGood
720pSocial media, web content, balanced quality/speedMediumBetter
1080pMarketing assets, professional content, high-detail scenesSlowerBest

Use cases

  • Social media content: Quick video clips for TikTok, Instagram Reels, and YouTube Shorts
  • Marketing previews: Rapid concept visualization before full production
  • Product demonstrations: Animated product showcases from text descriptions
  • Educational content: Explainer videos and visual learning materials
  • Creative exploration: Experimental motion and abstract visualizations
  • Storyboarding: Visual previews for film and video pre-production

API operations

Generate videos by submitting a text prompt to the appropriate resolution endpoint. The service returns a task ID for async polling or webhook notification.

Parameters

ParameterTypeRequiredDefaultDescription
promptstringYes-Main description of the video including scene, characters, motion, camera moves, and style. Maximum 800 characters.
durationstringNo"5"Video length: "5" seconds (faster) or "10" seconds (more action)
negative_promptstringNo-Elements to avoid in the output (e.g., “blurry, low quality, watermark”). Maximum 500 characters.
enable_prompt_expansionbooleanNotrueAI optimizer expands shorter prompts into detailed scripts
seedintegerNoRandomSeed for reproducibility (0 to 2147483647). Use same seed with identical parameters for similar results.
webhook_urlstringNo-URL for async completion notifications

Frequently Asked Questions

WAN 2.5 Text-to-Video is an AI video generation API developed by Alibaba. You submit a text prompt describing your desired video, receive a task ID immediately, then poll for results or receive a webhook notification when processing completes. The model interprets your description and generates a video matching the scene, motion, and style you specified.
Choose based on your use case: 480p is fastest and ideal for rapid prototyping or mobile-first content. 720p balances quality and speed, suitable for most social media and web content. 1080p delivers the highest quality for marketing assets and professional content but takes longer to process.
Processing time depends on resolution and duration. 480p processes fastest, followed by 720p, then 1080p. A 5-second clip generates faster than a 10-second clip. For production workflows, use webhooks instead of polling.
Be specific about scenes and visual details. Describe camera movements (zoom, pan, tilt), lighting, atmosphere, and subject actions. Example: “fluffy orange cat on wooden windowsill, looking at snow falling outside, soft warm lighting, slow camera zoom in” produces better results than “cat looking outside.”
When enabled (default), the AI optimizer expands shorter prompts into detailed video scripts. This is useful when you have a simple idea but want richer video output. Disable it if you want precise control over exactly what the model generates.
Rate limits vary by subscription tier. See Rate Limits for current limits.
See the Pricing page for current rates and subscription options.
WAN 2.5 Text-to-Video offers 480p, 720p, and 1080p resolution options with prompt expansion. WAN 2.6 provides enhanced quality and is available for both text-to-video and image-to-video workflows. Choose WAN 2.5 for more resolution flexibility; choose WAN 2.6 for the latest quality improvements.

Best practices

  • Prompt writing: Be specific about scenes, camera movements, lighting, and subject actions. Detailed prompts produce better results than vague descriptions.
  • Resolution selection: Start with 480p for rapid iteration, then switch to higher resolutions for final output.
  • Duration choice: Use 5-second clips for quick previews; 10-second clips allow more complex motion and narrative development.
  • Negative prompts: Include common issues to avoid: “blurry, low quality, watermark, text, distortion, extra limbs.”
  • Reproducibility: Save the seed value if you like a result and want to generate variations with similar characteristics.
  • Production integration: Use webhooks for scalable applications instead of polling.
  • Error handling: Implement retry with exponential backoff for 503 errors during high-demand periods.