🎙 WavTTS: Direct Audio Waveform Synthesis

WavTTS has been introduced—the first zero-shot TTS framework that models raw audio waveforms directly, bypassing intermediate stages such as mel-spectrograms. The model is built on a Flow Matching and Diffusion Transformer architecture.

🌍 Moving to direct waveform modeling eliminates information loss and allows for the creation of more accurate end-to-end speech synthesis systems.

👤 This is a step toward more natural voice cloning. The project has open-source code (MIT), but the weights are under CC BY-NC 4.0.

Source 1: https://wavtts.github.io/ Source 2: https://github.com/cwx-worst-one/WavTTS