The Rio de Janeiro Administration has introduced Rio 3.5 Open 397B — a large-scale multimodal language model built on the Mixture-of-Experts (MoE) architecture, featuring an extremely long context window.

image
image
image

What Happened

The Rio 3.5 Open 397B model was developed by merging Nex-N2-Pro and Qwen3.5-397B-A17B using the On-Policy Distillation method to enhance quality. Despite a total size of 397 billion parameters, the MoE architecture ensures that only about 17 billion parameters are active at any given time. The model supports a context window of up to 1,010,000 tokens and is distributed under the MIT open license.

Context

The development relies on advanced knowledge transfer methods from more powerful systems to the base models of the Qwen family. The use of MoE architecture allows for efficient management of a massive number of parameters, providing high performance with lower computational costs per token compared to dense models of a similar scale.

Why It Matters for the Industry

The emergence of a powerful Open Weights solution with ultra-long context support intensifies competition with proprietary systems like GPT-4. This demonstrates the effectiveness of knowledge distillation and paves the way for creating high-performance local analytical systems, reducing business dependency on closed APIs.

Why It Matters for Users

For developers and researchers, this means the ability to utilize SOTA-level model capabilities in their own projects for free and locally. The massive context window allows for processing entire books or large-scale codebases in a single request, making the model ideal for building complex autonomous agents and performing deep data analysis.

What Is Not Yet Known / Limitations

Practical application of the model is limited by colossal VRAM requirements for deployment, as well as a lack of public data regarding latency in real-world workflows.

Sources

Author

Look at AI, Editorial Team