Is an AI Agent's Action Confirmation Prompt a Security Boundary?

Security researcher NikosRig has identified critical vulnerabilities in Nous Research's Hermes Agent that allow for the bypassing of command confirmation mechanisms. The incident has sparked a debate over whether interface prompts (human-in-the-loop) constitute a legitimate security defense or are merely a user convenience feature.

What Happened

During the research, methods were discovered to bypass action approval requests via injections into the "smart" confirmation mode, arbitrary code execution through startup hooks, and the exploitation of shell command parsing errors. Despite the identified attack vectors, Nous Research revised its security policy, reclassifying the bypass of confirmations from critical vulnerabilities to "heuristic methods," which led to the closure of the relevant reports.

Context

The issue touches upon a fundamental design principle of AI agents: the use of human-in-the-loop (HITL) mechanisms to control terminal actions. There is a technical gap between interface confirmation (approval prompt) and actual programmatic isolation (sandboxing), where security is ensured at the runtime environment level rather than the user interaction level.

Why It Matters for the Industry

The incident provokes a dispute regarding security standards in the AI industry. Different companies choose different approaches to classifying such bugs: some (like Anthropic with Claude Code) view them through the lens of security, while others (like Nous Research) tend to classify them as model behavioral characteristics. This could slow down the implementation of CVE standards for LLM agents and force developers to implement more complex and resource-intensive isolation solutions, such as Docker, gVisor, or Firecracker, even for simple tools.

Why It Matters for Users

Users of AI agents with access to system resources cannot rely solely on confirmation pop-ups. An attacker can manipulate the context or use workarounds in shell commands to execute actions covertly. For safe operation, it is crucial to use agents only in isolated environments, such as containers or virtual machines.

What Remains Unknown / Limitations

In the ongoing discussion, a distinction remains between technical specialists, who focus on bypass methods (injections, parsing errors), and the business community, which emphasizes the standardization of responsibility and vulnerability classification.

Sources

NikosRig Disclosure

Author

Look at AI, Editorial Staff