Qwen-AgentWorld Outperforms Claude Opus and GPT-5.4

🤖 Qwen-AgentWorld Outperforms Claude Opus and GPT-5.4

The Qwen team has introduced Qwen-AgentWorld — new open-weight models (35B MoE and 397B) trained as world language models to simulate web, terminal, coding, search, OS, Android, and MCP. The 397B model scored 58.71 points on the AgentWorldBench benchmark, surpassing GPT-5.4 and Claude Opus with 4.8.

🌍 The emergence of specialized simulator models allows for scaling agentic Reinforcement Learning (RL) training in digital environments, bypassing the limitations of reality. This accelerates the creation of universal AI agents with deep planning capabilities.

👤 You can use the open weights of these models to create your own agents capable of working effectively in terminals, browsers, or operating systems.

Source 1: https://arxiv.org/abs/2606.24597 Source 2: https://qwen.ai/blog?id=qwen-agentworld

Sources