🎧 Sber Releases Open-Source KVAE-Audio

The Kandinsky Lab team has released KVAE-Audio, an open-source audio tokenizer under the MIT license. The model operates at a 48 kHz sampling rate and provides a 960x temporal compression, representing sound in a compact 64-channel latent space.

🌍 KVAE-Audio serves as an efficient alternative to heavyweight solutions like Stability AI's SAME-L, providing comparable quality with significantly fewer parameters (166.9M vs. 852.1M).

👤 Developers gain a high-quality tool for audio content creation that requires much fewer computational resources.

Source 1: https://habr.com/ru/companies/sberbank/articles/1053410/ Source 2: https://github.com/kandinskylab/kvae-audio