LiveEdit has been introduced—an innovative framework that allows for real-time video stream editing using Wan2.1-based diffusion models. By employing a three-stage distillation process and a mask caching system, the system achieves a speed of 12.66 FPS, providing a level of interactivity previously unavailable for similar methods.

image
image

What Happened

LiveEdit has been developed to transition the video editing process from batch post-processing mode to interactive streaming mode. The system utilizes a three-stage distillation: moving from a powerful bidirectional model to an efficient unidirectional streaming editor, which reduces the generation process to just 4 steps. To optimize computations, an AR-oriented Mask Cache has been implemented, allowing for the reuse of mask tokens for unchanged background areas.

Context

Traditional video editing methods based on diffusion models typically require significant time for processing completed files (post-processing), which precludes the possibility of interacting with video "on the fly." The project is based on the Wan2.1 architecture and aims to solve latency issues and background stability problems during frame-by-frame processing.

Why It Matters for the Industry

For the industry, this signifies a shift from batch processing to interactive stream editing, which is critical for the development of Augmented Reality (AR) technologies and AR systems. This technology paves the way for creating new SDKs for streaming platforms and content creator tools that work with real-time AI effects.

Why It Matters for Users

Users and streamers gain the ability to instantly change video visual parameters during live broadcasts—for example, adjusting lighting, changing clothing color, or applying stylistic filters with minimal latency. This makes the process of creating high-quality interactive content accessible without the need for expensive studio solutions.

What Is Not Yet Known / Limitations

Despite the impressive metrics, the project is in research status (presented at ECCV 2026), which requires caution when planning its integration into commercial enterprise systems.

Sources

Author

Look at AI, Editorial Team