Chinese company Moonshot AI has released Kimi K2.7 Code HighSpeed — a specialized high-speed mode for its multimodal coding model. The new solution achieves generation speeds of up to 260 tokens per second, which is several times faster than the standard version, bringing AI interaction closer to a real-time format.

What Happened
Moonshot AI introduced the HighSpeed mode for the Kimi K2.7 Code model. For average queries, the generation speed is approximately 180 tokens per second, reaching up to 260 tokens per second on short contexts. Access to this functionality is already open to Kimi Code Beta participants, Kimi API developers, and corporate clients via Kimi Business.
Context
The development of high-speed inference modes is a response to the need to reduce latency in AI-assisted development cycles. Such optimizations may include the use of specialized decoders or KV-cache optimization, allowing LLMs to transform from "question-answer" tools into full-fledged components for continuous development.
Why It Matters for the Industry
For the industry, this launch marks a shift in competition from simple code generation quality to inference efficiency. High throughput paves the way for creating autonomous coding agent systems capable of performing iterative tasks—such as unit testing and bug fixing—in real-time without interrupting the workflow.
Why It Matters for Users
Developers gain the ability to use Kimi K2.7 Code in a mode that is as close to interactive as possible, which is critical when working with large volumes of code. However, the economic aspect should be considered: using the HighSpeed mode may require higher costs, as API rates in this mode may be twice as high as standard rates.
What Is Not Yet Known / Limitations
There is uncertainty regarding the exact architectural implementation of the acceleration and the potential risks of increasing the Total Cost of Ownership (TCO) for large enterprises due to higher tariffs.
Sources
Author
Look at AI, Editorial Team
