The new version, Sonilo v1.1, has been introduced—a specialized AI model for generating music based on video content. The update focuses on improving rhythmic and emotional synchronization, as well as the ability to integrate soundtracks without losing the original speech in the video.


What Happened
Sonilo developers have released version v1.1, which significantly improves the quality of video-to-music generation. During testing, rhythm synchronization and alignment with video emotions reached a 78% user preference rate. New features include segmented control via prompts for different parts of the video and a function to preserve original speech when overlaying music. The model is available via API, the fal.ai platform, and ComfyUI.
Context
Unlike standard text-to-music solutions, Sonilo v1.1 is focused on creating a semantically linked audiovisual experience. To ensure commercial safety when using the model in professional production, a licensed dataset from Shutterstock is utilized.
Why It Matters for the Industry
For the industry, this signifies a shift from simple background noise overlay to deep automation of sound design. The use of licensed data guarantees the legal purity of content, while availability through API and ComfyUI allows for integrating audio generation into automated video production workflows ('video-to-ready-video').
Why It Matters for Users
Content creators—from bloggers to advertising agencies—gain a tool for quickly creating professional soundtracks. Automating synchronization processes and music ducking under voice significantly reduces the time spent on manual audio editing for formats such as vlogs, tutorials, and commercials.
What Is Not Yet Known / Limitations
Technical assessment indicates a lack of detailed data regarding the fundamental scientific novelty of the architecture; the update is characterized more as a qualitative product improvement of existing mechanisms.
Sources
Author
Look at AI, Editorial Team
