🛡 2,000 People Attempted to Hack an AI Assistant
Developer Fernando Irarrazaval conducted an experiment at hackmyclaw.com, where users attempted to hack his AI assistant Fiu (based on Claude Opus) via prompt injection. Despite over 6,000 attacks, no secret data was stolen.
🌍 Using top-tier models significantly increases an agent's resilience to attacks, but relying solely on system instructions when access rights are present remains risky.
👤 The case study demonstrates real-world attack methods (role-playing, linguistic mimicry) and highlights the importance of choosing a powerful model to protect critical automation tasks.
Source 1: https://www.fernandoi.cl/posts/hackmyclaw/
