Alibaba WAN 2.7 integration
WAN 2.7 Reference-to-Video generates videos featuring specific characters from reference images or videos, maintaining visual identity and supporting optional voice references.
Key capabilities
- Character references: Provide up to 5 combined character images and videos for identity preservation
- Prompt-based character placement: Reference characters as “Image 1”, “Image 2”, “Video 1” in the prompt
- Voice-guided generation: Optionally include
reference_voiceaudio per character for voice-guided output - Resolution options: 720P (1280x720) and 1080P (1920x1080) output
- 5 aspect ratios:
16:9,9:16,1:1,4:3,3:4(or auto-detect fromstart_image_url) - Flexible durations: 2 to 10 seconds of video output
- Start frame control: Optionally provide
start_image_urlto set the first frame and auto-detect aspect ratio - Async processing: Webhook notifications or polling for task completion
How character references work
- Provide character images via
image_urls(JPEG/PNG/BMP/WEBP, 240-8000px, max 20MB each) - Provide character videos via
video_urls(MP4/MOV, max 100MB each) - Combined total of images + videos must not exceed 5
- Reference characters in the prompt using position labels: “Image 1”, “Image 2”, “Video 1”, “Video 2”
- Optionally add
reference_voiceaudio URL per character for voice-guided generation
“Image 1 and Image 2 are walking together in a park while Video 1 plays guitar in the background.”
Use cases
- Consistent character videos: Generate multiple videos with the same character across different scenes
- Multi-character narratives: Create scenes with up to 5 characters interacting
- Branded content: Maintain consistent mascot or spokesperson identity across video campaigns
- Voice-synchronized video: Guide character motion using voice references for natural lip and gesture sync
- Social media series: Create episodic content with recurring characters
- Virtual presenters: Generate videos of a reference person in different settings
API operations
Generate videos by submitting character references and a prompt to the API. The service returns a task ID for async polling or webhook notification.POST /v1/ai/reference-to-video/wan-2-7
Create a new reference-to-video generation task
GET /v1/ai/reference-to-video/wan-2-7
List all WAN 2.7 R2V tasks with status
GET /v1/ai/reference-to-video/wan-2-7/{task-id}
Get task status and results by ID
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Scene description referencing characters as “Image 1”, “Video 1”, etc. Max 5000 characters |
negative_prompt | string | No | - | Elements to avoid (e.g., “blurry, watermark”). Max 500 characters |
image_urls | array | Conditional | - | Character reference images. Each item has url (required) and optional reference_voice |
video_urls | array | Conditional | - | Character reference videos. Each item has url (required) and optional reference_voice |
start_image_url | string | No | - | First-frame image. If provided, overrides aspect_ratio with image dimensions |
aspect_ratio | string | No | "16:9" | Output ratio: "16:9", "9:16", "1:1", "4:3", "3:4" |
resolution | string | No | "1080P" | Output resolution: "720P" or "1080P" |
duration | integer | No | 5 | Video length in seconds: 2 to 10 |
seed | integer | No | Random | Seed for reproducibility (0 to 2147483647) |
additional_settings.prompt_extend | boolean | No | true | Enable AI prompt expansion for richer output |
webhook_url | string | No | - | URL for async status notifications |
Frequently Asked Questions
What is WAN 2.7 Reference-to-Video and how does it work?
What is WAN 2.7 Reference-to-Video and how does it work?
WAN 2.7 Reference-to-Video is an AI video generation API developed by Alibaba. You provide character reference images or videos along with a text prompt that describes a scene. The model generates a video featuring those characters while preserving their visual identity. You receive a task ID immediately, then poll for results or receive a webhook notification.
How many character references can I use?
How many character references can I use?
You can provide up to 5 combined character references (images + videos). For example: 3 character images and 2 character videos, or 5 images and 0 videos. At least one image or video reference is required.
How do I reference characters in the prompt?
How do I reference characters in the prompt?
Use position labels based on the order you provide references. Image references are labeled “Image 1”, “Image 2”, etc. Video references are labeled “Video 1”, “Video 2”, etc. Example: “Image 1 and Video 1 are having a conversation at a cafe.”
What is voice-guided generation?
What is voice-guided generation?
Each character reference (image or video) can include an optional
reference_voice URL pointing to an audio file. The model uses this voice to guide character motion and lip movement in the generated video, creating more natural character animation.What image and video formats are supported for references?
What image and video formats are supported for references?
Image references: JPEG, PNG, BMP, WEBP (240-8000px per side, max 20MB). Video references: MP4, MOV (max 100MB). All files must be at publicly accessible URLs.
What are the rate limits for WAN 2.7 Reference-to-Video?
What are the rate limits for WAN 2.7 Reference-to-Video?
Rate limits depend on your subscription tier. See the Rate Limits page for current limits by plan.
How much does WAN 2.7 Reference-to-Video cost?
How much does WAN 2.7 Reference-to-Video cost?
See the Pricing page for current rates and subscription options.
Best practices
- Character images: Use clear, well-lit images with the character prominently visible. Avoid busy backgrounds.
- Character videos: Shorter reference videos with clear character visibility produce better identity preservation.
- Prompt structure: Explicitly name each character by label (“Image 1 walks left while Image 2 sits down”) for predictable placement.
- Voice references: Provide clean audio clips with minimal background noise for best voice-guided results.
- Duration selection: Reference-to-Video supports 2-10 seconds. Start with shorter durations for iteration.
- Negative prompts: Include: “blurry, low quality, watermark, text, distortion, extra limbs”
- Production integration: Use webhooks for scalable applications instead of polling.
- Error handling: Implement retry with exponential backoff for 503 errors during high-demand periods.
Related APIs
- WAN 2.7 Text-to-Video: Generate videos from text prompts without character references
- WAN 2.7 Image-to-Video: Animate images or extend existing videos
- Kling 3 Omni: Alternative video generation model with reference video support