WAN 2.7 Reference To Video API

Alibaba WAN 2.7 integration

WAN 2.7 Reference-to-Video generates videos featuring specific characters from reference images or videos, maintaining visual identity and supporting optional voice references.

WAN 2.7 Reference-to-Video is an AI video generation API that creates MP4 videos featuring characters from reference images or videos. You provide up to 5 character references (images and/or videos combined), then describe a scene in the prompt using labels like “Image 1” or “Video 1” to place those characters. The model maintains visual identity of referenced characters across the generated video. Output is available at 720P (1280x720) or 1080P (1920x1080) resolution with durations from 2 to 10 seconds.

Key capabilities

Character references: Provide up to 5 combined character images and videos for identity preservation
Prompt-based character placement: Reference characters as “Image 1”, “Image 2”, “Video 1” in the prompt
Voice-guided generation: Optionally include reference_voice audio per character for voice-guided output
Resolution options: 720P (1280x720) and 1080P (1920x1080) output
5 aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4 (or auto-detect from start_image_url)
Flexible durations: 2 to 10 seconds of video output
Start frame control: Optionally provide start_image_url to set the first frame and auto-detect aspect ratio
Async processing: Webhook notifications or polling for task completion

How character references work

Provide character images via image_urls (JPEG/PNG/BMP/WEBP, 240-8000px, max 20MB each)
Provide character videos via video_urls (MP4/MOV, max 100MB each)
Combined total of images + videos must not exceed 5
Reference characters in the prompt using position labels: “Image 1”, “Image 2”, “Video 1”, “Video 2”
Optionally add reference_voice audio URL per character for voice-guided generation

Example prompt:

“Image 1 and Image 2 are walking together in a park while Video 1 plays guitar in the background.”

Use cases

Consistent character videos: Generate multiple videos with the same character across different scenes
Multi-character narratives: Create scenes with up to 5 characters interacting
Branded content: Maintain consistent mascot or spokesperson identity across video campaigns
Voice-synchronized video: Guide character motion using voice references for natural lip and gesture sync
Social media series: Create episodic content with recurring characters
Virtual presenters: Generate videos of a reference person in different settings

API operations

Generate videos by submitting character references and a prompt to the API. The service returns a task ID for async polling or webhook notification.

POST /v1/ai/reference-to-video/wan-2-7

Create a new reference-to-video generation task

GET /v1/ai/reference-to-video/wan-2-7

List all WAN 2.7 R2V tasks with status

GET /v1/ai/reference-to-video/wan-2-7/{task-id}

Get task status and results by ID

Parameters

Parameter	Type	Required	Default	Description
`prompt`	`string`	Yes	-	Scene description referencing characters as “Image 1”, “Video 1”, etc. Max 5000 characters
`negative_prompt`	`string`	No	-	Elements to avoid (e.g., “blurry, watermark”). Max 500 characters
`image_urls`	`array`	Conditional	-	Character reference images. Each item has `url` (required) and optional `reference_voice`
`video_urls`	`array`	Conditional	-	Character reference videos. Each item has `url` (required) and optional `reference_voice`
`start_image_url`	`string`	No	-	First-frame image. If provided, overrides `aspect_ratio` with image dimensions
`aspect_ratio`	`string`	No	`"16:9"`	Output ratio: `"16:9"`, `"9:16"`, `"1:1"`, `"4:3"`, `"3:4"`
`resolution`	`string`	No	`"1080P"`	Output resolution: `"720P"` or `"1080P"`
`duration`	`integer`	No	`5`	Video length in seconds: 2 to 10
`seed`	`integer`	No	Random	Seed for reproducibility (0 to 2147483647)
`additional_settings.prompt_extend`	`boolean`	No	`true`	Enable AI prompt expansion for richer output
`webhook_url`	`string`	No	-	URL for async status notifications

Frequently Asked Questions

What is WAN 2.7 Reference-to-Video and how does it work?

WAN 2.7 Reference-to-Video is an AI video generation API developed by Alibaba. You provide character reference images or videos along with a text prompt that describes a scene. The model generates a video featuring those characters while preserving their visual identity. You receive a task ID immediately, then poll for results or receive a webhook notification.

How many character references can I use?

You can provide up to 5 combined character references (images + videos). For example: 3 character images and 2 character videos, or 5 images and 0 videos. At least one image or video reference is required.

How do I reference characters in the prompt?

Use position labels based on the order you provide references. Image references are labeled “Image 1”, “Image 2”, etc. Video references are labeled “Video 1”, “Video 2”, etc. Example: “Image 1 and Video 1 are having a conversation at a cafe.”

What is voice-guided generation?

Each character reference (image or video) can include an optional reference_voice URL pointing to an audio file. The model uses this voice to guide character motion and lip movement in the generated video, creating more natural character animation.

What image and video formats are supported for references?

Image references: JPEG, PNG, BMP, WEBP (240-8000px per side, max 20MB). Video references: MP4, MOV (max 100MB). All files must be at publicly accessible URLs.

What are the rate limits for WAN 2.7 Reference-to-Video?

Rate limits depend on your subscription tier. See the Rate Limits page for current limits by plan.

How much does WAN 2.7 Reference-to-Video cost?

See the Pricing page for current rates and subscription options.

Best practices

Character images: Use clear, well-lit images with the character prominently visible. Avoid busy backgrounds.
Character videos: Shorter reference videos with clear character visibility produce better identity preservation.
Prompt structure: Explicitly name each character by label (“Image 1 walks left while Image 2 sits down”) for predictable placement.
Voice references: Provide clean audio clips with minimal background noise for best voice-guided results.
Duration selection: Reference-to-Video supports 2-10 seconds. Start with shorter durations for iteration.
Negative prompts: Include: “blurry, low quality, watermark, text, distortion, extra limbs”
Production integration: Use webhooks for scalable applications instead of polling.
Error handling: Implement retry with exponential backoff for 503 errors during high-demand periods.

WAN 2.7 Text-to-Video: Generate videos from text prompts without character references
WAN 2.7 Image-to-Video: Animate images or extend existing videos
Kling 3 Omni: Alternative video generation model with reference video support

Get Started

APIs

WAN 2.7 Reference To Video API

Alibaba WAN 2.7 integration

Key capabilities

How character references work

Use cases

API operations

POST /v1/ai/reference-to-video/wan-2-7

GET /v1/ai/reference-to-video/wan-2-7

GET /v1/ai/reference-to-video/wan-2-7/{task-id}

Parameters

Frequently Asked Questions

Best practices

Get Started

APIs

Alibaba WAN 2.7 integration

​Key capabilities

​How character references work

​Use cases

​API operations

POST /v1/ai/reference-to-video/wan-2-7

GET /v1/ai/reference-to-video/wan-2-7

GET /v1/ai/reference-to-video/wan-2-7/{task-id}

​Parameters

​Frequently Asked Questions

​Best practices

​Related APIs

Key capabilities

How character references work

Use cases

API operations

Parameters

Frequently Asked Questions

Best practices

Related APIs