The DIRECT (Decomposed Injection for Reference Composition and Target-integration) framework has been introduced, allowing for the insertion of objects into images with precise control over their 3D pose. Unlike traditional 2D inpainting methods, this approach ensures high realism and physical consistency within the scene.


What Happened
The DIRECT method was developed using the concept of a "visual triplet": the object's appearance, its 3D geometry (via a render proxy), and the context of the target scene. The technical implementation is based on the integration of FLUX.1-Fill-dev, SigLIP2, and TRELLIS models to ensure control over posing and texture quality.
Context
Standard 2D inpainting methods often face the problem of pose and geometry mismatch when attempting to add a new object to an existing scene. Moving toward 3D-aware compositing allows these tasks to be solved at the level of understanding the image's spatial structure.
Why It Matters for the Industry
For the industry, this signifies a shift from simple area filling to full-fledged controllable 3D compositing. This opens up possibilities for creating high-precision design tools, automated pipelines for CGI, and generative art, where geometric control becomes an integral part of the content creation process.
Why It Matters for Users
Users and designers gain a tool that allows them to do more than just "paste" an object onto a background; they can specify its exact rotation and position in space while maintaining natural lighting and detail. At its current stage, DIRECT is suitable for high-quality offline production, such as asset generation for film.
What Is Not Yet Known / Limitations
The current architecture is a complex multi-stage pipeline using several heavy models, which creates challenges for production deployment due to high latency.
Sources
- Direct 3D-Aware Object Insertion via Decomposed Visual Proxies (Project Page)
- Gong1130/DIRECT (GitHub Repository)
- superGong/DIRECT (Hugging Face)
Author
Look at AI, Editorial Team
