🛠 AWS has introduced GEDD — a tool for the qualitative evaluation of AI agent performance.
The project utilizes Grounded Theory methodology, allowing experts to identify specific errors through dialogue with the agent. The results are automatically converted into testing pipelines and LLM judges (G-Eval) for integration into CI/CD via SageMaker MLflow.
🌍 It solves the problem of "blind testing" by turning the qualitative experience of experts into automated quantitative tests.
👤 It enables faster verification of AI agents against deep domain-specific errors, such as hallucinations in medicine or law.
