🛠 OpenMOSS has introduced MOSS-Video-Preview — a multimodal model for real-time video understanding.

The architecture, based on Llama-3.2-Vision, utilizes gated cross-attention and streaming inference to process video streams with minimal latency.

🌍 This is a step toward interactive AI assistants that respond to video in real-time without buffering.

👤 AI will be able to "see" and discuss live events almost instantaneously.

Source 1: https://github.com/OpenMOSS/MOSS-Video-Preview Source 2: https://huggingface.co/collections/OpenMOSS-Team/moss-video-preview