Moonshot AI Releases Kimi Code CLI: Open-Source Multimodal Terminal Agent

Moonshot AI has introduced Kimi Code CLI — a powerful open-source tool for terminal-based development. Based on the multimodal Kimi K2.6 model with a Mixture-of-Experts architecture, the agent is capable of not only editing code but also analyzing video input for efficient debugging.

What Happened

Moonshot AI released Kimi Code CLI under the MIT license. This terminal AI agent is based on the Kimi K2.6 model, which has approximately 1 trillion parameters. The tool supports reading and editing code, executing shell commands, and multimodal input, including text, images, video, and PDF. To integrate with modern development tools, the CLI supports the Model Context Protocol (MCP) and Agent Client Protocol (ACP), allowing it to connect to external tools and integrate with IDEs such as Zed and JetBrains.

Context

Unlike traditional autocomplete tools, Kimi Code CLI utilizes a Mixture-of-Experts (MoE) architecture to ensure high knowledge density. The system is built on support for sub-agents (coder, researcher, planner) and includes lifecycle hooks to automate the auditing of decisions made. The use of open protocols like MCP and ACP aims to standardize interaction between autonomous agents and development environments.

Why It Matters for the Industry

The emergence of a strong open-source player in the terminal agent segment increases competition with proprietary solutions like Claude Code. This stimulates the adoption of industry standards (MCP/ACP) and lowers the barrier to entry for creating specialized custom agents, transforming the CLI from a simple interface into a full-fledged platform for automating engineering tasks.

Why It Matters for Users

Developers gain a free, high-performance alternative to paid solutions. A key advantage is multimodality: users can now "show" the agent a screen recording of a bug, using video as direct context for generating fixes. The tool is easily extensible through custom skills and sub-agents, allowing for flexible workflows built directly within the terminal.

What Is Not Yet Known / Limitations

At this time, questions remain regarding latency when working with heavy multimodal data, the inference cost of a model of this scale, and data security during screen capture.

Sources

Author

Look at AI, Editorial Team