Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Explore examples of motion-controllable video generation with Wan-Move. This page showcases the various capabilities of the framework, including single-object motion, multi-object choreography, camera control, motion transfer, and 3D rotations.

Motion Control Capabilities

Wan-Move generates high-quality 5-second videos at 832×480p resolution with precise point-level motion control. Each example demonstrates how dense point trajectories guide the movement of objects within the generated videos.

Single-Object Motion Control

Control the movement of individual objects within a scene. Define trajectories that specify exactly where objects should move, and Wan-Move generates videos where those objects follow the specified paths while maintaining natural appearance and interaction with the environment.

Precise control over individual object movements
Natural motion that respects physics and scene context
Maintains visual quality throughout the video duration

Multi-Object Motion Control

Choreograph multiple objects simultaneously, each following independent trajectories. This capability enables complex scenes where different elements move in coordinated or independent patterns, creating dynamic compositions.

Independent trajectory control for multiple objects
Coordinated or independent motion patterns
Complex scene dynamics with multiple moving elements

Camera Control

Simulate professional camera movements without physically moving a camera. Wan-Move supports various camera operations including linear displacement, dolly in, dolly out, panning, and other cinematic movements.

Linear displacement for smooth tracking shots
Dolly in and out for depth-based camera movement
Panning for horizontal scene exploration

Motion Transfer

Extract motion patterns from existing videos and apply them to different content. This allows for reusing successful motion templates across different scenes and subjects, enabling consistent motion styles.

Extract motion patterns from reference videos
Apply extracted motion to new content
Reuse successful motion templates

3D Rotation

Generate videos showing objects rotating in three-dimensional space. This is particularly useful for product demonstrations, architectural visualization, and any application requiring 360-degree views of objects or scenes.

Full 360-degree rotations in 3D space
Suitable for product demonstrations
Architectural and object visualization

How Wan-Move Works

Latent Trajectory Guidance

The core technique in Wan-Move is latent trajectory guidance. This method represents motion conditions by propagating features from the first frame along user-defined trajectories. The process involves:

Provide an initial frame showing the starting state
Define point trajectories specifying where elements should move
The model propagates first-frame features along these trajectories
Video generation respects the trajectory constraints while maintaining quality

Input Requirements

Initial Image

A single frame showing the starting state of the scene. This image provides the appearance information for objects that will be animated.

Trajectory Data

NumPy arrays containing x,y coordinates for tracked points across all frames. Each point's path defines how that part of the scene should move.

Visibility Masks

Information about when points are visible or occluded. This helps the model handle situations where objects move behind other elements or leave the frame.

Text Prompt

A description of the scene that provides context for video generation. The prompt guides the overall appearance and style of the generated content.

Evaluation on MoveBench

MoveBench provides standardized test cases for evaluating motion-controllable video generation. The benchmark includes both single-object and multi-object scenarios across diverse content categories. Each test case includes reference images, trajectory annotations, visibility masks, and text descriptions in both English and Chinese.

Try Wan-Move

To experiment with Wan-Move, you can install it locally following the instructions on the installation page. The system requires:

Python environment with PyTorch 2.4.0 or later
Downloaded model weights (Wan-Move-14B-480P)
GPU hardware for practical inference times
Input images and trajectory data

Gradio Demo Coming Soon

The research team has indicated plans to release a Gradio demo interface that will provide a user-friendly way to interact with Wan-Move. This demo will allow users to upload images, define trajectories through an interactive interface, and generate videos without writing code.

Performance Comparisons

Qualitative comparisons show that Wan-Move produces videos with motion accuracy comparable to commercial solutions. The framework has been compared with both academic methods like Tora and commercial solutions like Kling 1.5 Pro, demonstrating competitive performance in motion controllability while being open for research and development.

Video Quality Characteristics

Resolution: 832×480 pixels
Duration: 5 seconds
Frame Rate: Standard video frame rate
Visual Fidelity: High-quality realistic appearance
Motion Accuracy: Precise trajectory following
Temporal Consistency: Smooth transitions between frames

Wan-Move Examples