News

Rubric: A framework for testing AI agent behavior

Rubric has been introduced—a Python framework for evaluating the internal actions of AI agents, rather than just their text responses.

Compiled by Sergey KostenchukPublished 2026-06-12Updated 2026-06-12

2026-06-12 Coding OpenAI

Expanded analysis for this story

Open the longform version with context, source trail, and what changed.

Read longform

Show HN: Rubric – test what your LLM agent did, not just what it said Source

🛠 Rubric: A framework for testing AI agent behavior

Rubric has been introduced—a Python framework for evaluating AI agents. It analyzes internal processes: tool calls, arguments, action sequences, and the quality of reasoning traces.

🌍 It solves the problem of "invisible" regressions, where the text response is correct, but the logic of tool usage is broken.

👤 It enables a shift from checking "pretty answers" to ensuring the reliable operation of autonomous agents.

Source 1: https://github.com/Kareem-Rashed/rubric-eval

Sources

github.com