OmniDirector: A New Level of Camera Control in Video Generation...

The Kling team has introduced OmniDirector — an innovative framework that allows for cloning complex camera movements from existing videos. Instead of the traditional use of text parameters or simple commands, the system uses a visual "camera grid," which radically changes the approach to managing video generation dynamics.

What Happened

Developed by the Kling team, the OmniDirector framework implements a method of encoding trajectories as a camera grid. This allows the model to be trained on millions of "grid-video" pairs without the need for specialized cross-paired data. To ensure coordination between characters, their actions, and camera movement, a hierarchical Prompt Expansion (PE) Agent is used, built on top of Multimodal Diffusion Transformers (MMDiT).

Context

Before the emergence of OmniDirector, a major problem in training camera control systems was the critical shortage of specialized datasets with precise movement parameters. Moving from parametric control to visual-structural encoding via motion grids allows the system to bypass data scarcity by using massive amounts of unpaired video material to capture cinematic techniques.

Why It Matters for the Industry

This technology solves the fundamental problem of insufficient training data for camera control, enabling the transfer of complex techniques like Dolly Zoom or Bullet Time to new scenes. This paves the way for creating multi-shot videos with a unified narrative logic and lays the foundation for the emergence of full-fledged AI directing platforms, where multimodal agent management occurs through visual layers.

Why It Matters for Users

For content creators, this means a transition from text descriptions to "director-level" control: it is now possible to literally copy professional camera movement from any video and apply it to one's own generation. This significantly lowers the barrier to entry for professional video production, allowing for the prototyping of complex scenes without deep knowledge of 3D animation.

What Is Not Yet Known / Limitations

At the current stage, OmniDirector is a research framework and is not ready for full-scale industrial use (production) due to the lack of a public API, uncertainty regarding computational costs, and a lack of data on operational latency.

Sources

OmniDirector Project Page
[arXiv:2606.13432 [cs.CV]](https://arxiv.org/abs/2606.13432)

Author

Look at AI, Editorial Staff