By Insights Team in AI — 02 Jul 2025

ByteDance Releases 'Magic Brush' ATI for Video Generation with 90% Navigation Success and Open-Source Release!}

ByteDance unveils the innovative ATI model, enabling precise, trajectory-based video creation, achieving 90% success in complex scenes, and open-sourcing the high-performance model for developers.

Angtian Wang is a researcher at ByteDance, specializing in video generation, 3D vision, and differentiable rendering. He earned his PhD from Johns Hopkins University under Dr. Alan Yuille.

In recent years, advances in diffusion models, Transformer architectures, and high-performance visual understanding models have propelled progress in video generation, especially in static-to-video tasks that produce dynamic, temporally coherent content from minimal input.

However, despite quality improvements, a key bottleneck remains: lack of effective, intuitive, and user-friendly motion control methods.

Users often have clear motion intentions—such as character running in a specific direction, camera zooming, or animal jumping—but current methods rely on preset templates, action tags, or style prompts, lacking a flexible, precise way to specify object and camera movements. This limits creative expression and practical applications.

To address this, ByteDance introduced ATI—a novel, trajectory-based controllable video generation framework. ATI’s core idea is to convert user-drawn trajectories on images into explicit control signals for object and camera motion, integrated into a unified latent space model. This allows users to "draw where to move, and it moves there," enabling frame-level precise control through visual creativity.

Title: ATI: Any Trajectory Instruction for Controllable Video Generation
Paper: https://arxiv.org/pdf/2505.22944
Project page: https://anytraj.github.io/
GitHub: https://github.com/bytedance/ATI
Hugging Face: https://huggingface.co/bytedance-research/ATI
ComfyUI: https://github.com/kijai/ComfyUI-WanVideoWrapper

Method

ATI takes two inputs: a static image and a user-drawn trajectory, which can be any shape—straight, curved, looped, or abstract. It encodes these trajectories into motion vectors via a Gaussian Motion Injector, guiding the diffusion process to produce frame-by-frame object and camera movements.

As shown above, the model "understands" the user’s drawn trajectory and generates smooth, natural motion accordingly. The Gaussian Motion Injector and Pixel-wise Channel Fusion enable unified control of object motion, local body part movements, and camera angles, supporting multi-object, multi-style, and multi-task video generation across models like Seaweed-7B and Wan2.1-I2V-14B.

Results

Users can draw trajectories directly on the original image with fingers or mouse, and ATI captures and injects these paths into the diffusion model in real-time. Whether straight, curved, or complex loops, the system converts them into coherent, natural animations—"draw where to move, and it moves there."

In portrait or animal scenes, users can specify key actions like running, jumping, or arm waving. ATI samples and encodes keypoints in each frame, accurately reproducing natural, biomechanically plausible movements.

For scenes with multiple targets, ATI can process up to 8 trajectories simultaneously. It uses spatial masks and channel separation to ensure each object’s identity remains distinct, enabling coherent group interactions.

Moreover, ATI supports camera trajectory control—users can draw zoom, pan, or rotate paths on the original image, which are integrated with object trajectories to produce cinematic effects like panning, following, and tilting.

It demonstrates excellent cross-domain generalization, capable of producing styles ranging from realistic movies to cartoons, oil paintings, watercolor, and game art. Changing reference images and input trajectories allows style-preserving motion videos for diverse applications.

Users can draw trajectories beyond physical limits in the latent space to generate sci-fi or fantasy effects like flying, stretching, or twisting, expanding creative possibilities for imaginative scenes.

Based on the high-precision Wan2.1-I2V-14B model, ATI can generate photorealistic videos with detailed facial expressions, textures, and lighting effects. A lightweight Seaweed-7B version is also available for real-time interactions in resource-limited environments.

Open Source

Currently, the Wan2.1-I2V-14B model of ATI is officially open-sourced on Hugging Face, providing researchers and developers with high-quality, controllable video generation capabilities. The community ecosystem is rapidly developing: the ComfyUI-WanVideoWrapper plugin by Kijai supports FP8 quantization models (e.g., Wan2_1-I2V-ATI-14B_fp8_e4m3fn.safetensors), significantly reducing memory requirements for inference on consumer-grade GPUs. Additionally, Benji’s YouTube tutorial "ComfyUI Wan 2.1 Arbitrary Trajectory Control" offers detailed practical guidance. Full code and models are available on GitHub (bytedance/ATI) and Hugging Face.

Angtian Wang is a researcher at ByteDance, specializing in video generation, 3D vision, and differentiable rendering. He earned his PhD from Johns Hopkins University under Dr. Alan Yuille.

Method

Results

Open Source

Subscribe to QQ Insights