CriticalSafetyReviewer-confirmedPublished

Data exfiltration through prompt injection in agents

An injected instruction can make a tool-using agent send private data to an attacker-controlled destination.

Published June 26, 2026

Reproducibility: Rare
Severity: Critical
Confidence: Reviewer-confirmed

Details

When an agent with tools (browsing, email, file access) ingests attacker-controlled content, an injected instruction can chain into a real action — e.g. emailing conversation data out. The highest-severity composition of prompt injection with side-effecting tools.

Found with

🔬 Prompt-injection & jailbreak testing

Injection chained to a side-effecting tool call.

Evidence

Content ingested by an agent instructed it to exfiltrate context via an available tool; the agent attempted the action. Live payload withheld; tested in an isolated harness.

Illustrative example — see the linked reference for the documented evidence.

1 evidence item withheld. Live exploit payloads are not published — only the technique and impact are described (disclosure policy).

Affected versions

Anthropic · claude-opus-4-8OpenAI · gpt-4oGoogle · gemini-2.0-flash

References

Prompt injection Safety Agents

Source: https://simonwillison.net/2023/Apr/14/worst-that-can-happen/

Cite this

Qlarify Labs. (2026). Data exfiltration through prompt injection in agents. Retrieved from https://labs.qlarify.fi/findings/agent-data-exfiltration