JoyAI-VL-Interaction: The First Open-Source Real-Time Interactive VLM

The Joy Future Academy (JD) team has introduced JoyAI-VL-Interaction — an 8B parameter interactive Vision-Language Model capable of making decisions every second.

Compiled by Sergey KostenchukPublished 2026-06-16Updated 2026-06-16

2026-06-16 Research Google

🤖 JoyAI-VL-Interaction: The First Real-Time Interactive VLM

The Joy Future Academy (JD) team has introduced an open-source 8B parameter model that analyzes video streams every second and decides whether to speak, remain silent, or delegate tasks to agents. The system outperforms Gemini and Doubao in rapid response tasks.

🌍 The shift toward "presence agents" changes the paradigm: AI no longer just responds to queries but actively monitors the environment for live video surveillance, sports, and navigation.

👤 This is a step toward assistants that notice important events themselves (e.g., a person falling) and act or warn you instantly.

Source 1: https://joyai-vl-video-future-academy-jd.github.io/JoyAI-VL-Interaction/ Source 2: https://github.com/jd-opensource/JoyAI-VL-Interaction/

Sources