Linear's AI Agent Failure: Why Text Quality Assessment is Not Enough

An analysis of the case where Linear's AI agent sent incorrect emails to a client six times, highlighting the need to move from evaluating generation to fact-checking.

Compiled by Sergey KostenchukPublished 2026-06-22Updated 2026-06-23

2026-06-22 Coding

🤖 Linear's AI Agent Failure: Why Text Quality Assessment is Not Enough

This article analyzes a case where Linear's AI agent sent incorrect emails to a client six times. The main conclusion: agent errors in sales are not related to the quality of the text (generation), but to the lack of fact-checking (state-verification).

🌍 A paradigm shift is occurring in AI agent evaluation, moving from "LLM-as-a-judge" (assessing text quality) to verifying compliance with a "state contract."

👤 During development, it is crucial to focus not on the bot's politeness, but on whether it verifies critical data before performing an action.

Source 1: https://tenureai.dev/writing/why-most-ai-evals-would-miss-the-linear-sales-email-failure/

Sources