Retriever AI developers have introduced a revolutionary approach to browser agent operation called Code-as-Plan. This technology allows for a hundredfold reduction in LLM operational costs, transforming the language model from an expensive processor for every action into an efficient software code compiler.

image

What Happened

Retriever AI has introduced a new browser agent architecture called Code-as-Plan. Instead of using an LLM as the primary mechanism for processing every click and visual interface analysis, the system uses DeepSeek Flash as a compiler to generate JavaScript code. This code is then executed locally via a specialized interface (harness), using a text representation of the DOM instead of transmitting heavy screenshots.

Context

Traditional multimodal agents rely on vision-based execution, which requires constant transmission of screenshots and the use of powerful models for image analysis. This creates a so-called "multimodal tax"—high costs and latency at every step of the user's interaction with the interface.

Why It Matters for the Industry

A fundamental shift is occurring from multimodal control to efficient text-based agents with programmatic control. Using cheap and efficient models, such as DeepSeek Flash, as planners breaks the economic monopoly of large laboratories that could previously dictate API prices to subsidize their agent platforms. The industry is moving toward the LLM-as-Compiler pattern, where the model manages high-level logic while execution is handled by a local runtime.

Why It Matters for Users

The development and use of autonomous AI agents are becoming orders of magnitude cheaper and faster. For end users, this means the emergence of a new generation of ultra-fast and functional browser assistants that work almost instantaneously and do not cost enormous amounts of money for every automation session.

Sources

Author

Look at AI, Editorial Team