Qwen-AgentWorld Outperforms Claude Opus and GPT-5.4 on New Agent...

The Qwen team has introduced Qwen-AgentWorld — new open-weight models (35B MoE and 397B) specifically trained as Language World Models to simulate various digital environments.

What Happened

Developers released two models: 35B MoE and 397B, capable of simulating seven environments: web, terminal, coding, search, OS, Android, and MCP. The 397B model achieved a score of 58.71 on the AgentWorldBench benchmark, surpassing the performance of GPT-5.4 and Claude Opus (4.8). The training process involved using over 10 million environment interaction trajectories through a specialized CPT, SFT, and RL pipeline.

Context

Unlike traditional text-based LLMs, Qwen-AgentWorld focuses on predicting environment states. This allows for the creation of high-quality synthetic trajectories for agent training, shifting the focus from limited training on real-world internet data to scalable training within controlled digital simulations.

Why It Matters for the Industry

The emergence of specialized simulator models allows the industry to scale agentic Reinforcement Learning (RL) training in controlled digital environments, bypassing the physical or infrastructural constraints of the real world. This accelerates the creation of universal AI agents with deep planning capabilities and self-correction abilities.

Why It Matters for Users

Due to the availability of open weights on Hugging Face, researchers and developers can immediately begin experimenting with environment simulation. Users gain the ability to create their own agents for efficient work in terminals, browsers, or operating systems using next-state prediction methods.

What Is Not Yet Known / Limitations

There is technical skepticism regarding practical implementation due to unresolved infrastructure issues and the high inference cost of 397B-scale models.

Sources

Author

Look at AI, Editorial Staff