A large-scale analysis of more than 112,000 commits across 28 public repositories has shown that code created by AI agents does not contain more errors than code written by humans. On the contrary, using tools in a human-in-the-loop mode (T2 level) demonstrates a lower risk of introducing defects and higher durability in the codebase.


What Happened
A study of 112,382 commits revealed that T2-level agents (human-managed, such as Claude Code) show an Odds Ratio of 0.57, which means a significantly lower risk of introducing bugs compared to purely human code. Furthermore, AI-generated code stays in projects on average 17.9% longer, demonstrating higher survivability.
Context
There has long been a belief in the industry that AI is a source of technical debt and low-quality code. This analysis is based on the study of real Pull Requests, where the key success factor is the combination of an AI agent and human review, with the human acting as a quality controller.
Why It Matters for the Industry
The results debunk the myth of low-quality AI code and allow companies to shift their focus from the question "can AI write code" to building effective workflows for human-machine review. In the long term, this will lead to the standardization of agentic processes (T2/T3) as a baseline development standard and the transformation of the programmer's role into an architect and reviewer of AI agents.
Why It Matters for Users
Developers should stop fearing the use of tools like Cursor or Claude Code. Empirical data confirms that with the right approach (human-in-the-loop), using AI assistants can even increase the overall stability of the codebase through more precise and localized fixes.
What Remains Unknown / Limitations
The study focuses on technical efficiency and code quality, leaving aside questions of legal purity, intellectual property (IP), and copyright of the generated code.
Sources
Author
Look at AI, Editorial Team
