News

clawmark released — a CLI tool for A/B testing CLAUDE.md

Introducing clawmark — a Rust-based tool for comparative testing of CLAUDE.md instruction configurations using the SWE-bench Lite benchmark.

Compiled by Sergey KostenchukPublished 2026-06-18Updated 2026-06-18

2026-06-18 Coding Anthropic

Expanded analysis for this story

Open the longform version with context, source trail, and what changed.

🛠 clawmark has been released — a CLI tool written in Rust for conducting A/B testing of CLAUDE.md files.

The tool allows for comparing the effectiveness of different instruction configurations on the SWE-bench Lite task set.

🌍 It enables LLM-based system developers to scientifically optimize system prompts using standardized benchmarks instead of intuitive tuning.

👤 It provides the ability to quickly verify which version of AI agent instructions performs better on real-world programming tasks.

Source 1: https://github.com/emiliolugo/clawmark

Sources