hackmyclaw experiment: 2,000 people tried to hack an AI assistant...

Developer Fernando Irarrázaval conducted a large-scale experiment at hackmyclaw.com, during which more than 2,000 users made over 6,000 attempts to hack his AI assistant, Fiu. Despite the use of social engineering, role-playing, and multilingual queries, defense mechanisms based on Claude Opus 4.6 successfully prevented the leakage of secret data.

What happened

As part of the experiment, users attempted various prompt injection techniques to gain access to the contents of the secrets.env file. Attacks included creating panic situations, linguistic mimicry, and complex role-playing scenarios. The result confirmed that powerful models are capable of ignoring most standard attempts to bypass system instructions, thereby maintaining data confidentiality.

Context

The hackmyclaw.com project served as a test of modern LLM resilience against targeted attacks. The system used the Claude Opus 4.6 model as its core, allowing for the study of the correlation between a model's computational power and its ability to follow safety guidelines under aggressive external pressure.

Why it matters for the industry

For the AI industry, this case confirms the concept of *Security by Model Capability*: a model's complexity directly affects its resilience to hacking. However, this creates a technical trade-off between security and economics, as using SOTA (State-of-the-Art) models for protection significantly increases API costs and raises the risk of provider bans during mass attacks.

Why it matters for users

For users and automation developers, this case serves as a practical guide to real-world attack methods on AI agents. It highlights that choosing a powerful model is not just a matter of response quality, but a critical security element; however, relying solely on system instructions when wide access permissions are present remains risky.

What remains unknown / limitations

A powerful model is an effective but expensive barrier that does not replace full architectural security and access privilege isolation.

Sources

Fernando Irarrázaval

Author

Look at AI, Editorial