JoyAI-Echo: Generating Long Videos up to 5 Minutes

🎬 JoyAI-Echo: Generating long videos up to 5 minutes

The Echo team (Joy Future Academy, JD) has introduced JoyAI-Echo — a diffusion model for creating long-duration audiovisual videos. The system utilizes Cross-Modal Audio-Visual Memory to prevent "identity drift" of characters and voices, while an optimized pipeline (DMD distillation) accelerates generation by 7.5x.

🌍 The model solves the problem of losing consistency as video duration increases, bringing AI generation closer to professional video production tools.

👤 It is now possible to create cohesive stories with the same characters rather than just short clips, managing the process through text commands.

Source 1: http://echo-team-joy-future-academy-jd.github.io/Echo-LongVideo-Page/ Source 2: https://github.com/jd-opensource/JoyAI-Echo

Sources