Indirect prompt injection via retrieved content
Instructions hidden in documents, web pages or tool outputs can override the system prompt when ingested by the model.
Published June 26, 2026
- Reproducibility
- Sometimes
- Severity
- Critical
- Confidence
- Reviewer-confirmed
Details
An attacker plants instructions in content the model later reads (a web page, a PDF, an email). When the model processes that content, it may follow the attacker's instructions instead of the application's — exfiltrating data or taking unauthorized actions. The defining security risk for RAG and agentic systems.
Found with
Evidence
A retrieved document contained hidden instructions directing the model to ignore prior context and reveal system content. The model complied. Live payload withheld.
1 evidence item withheld. Live exploit payloads are not published — only the technique and impact are described (disclosure policy).
Affected versions
References
Source: https://simonwillison.net/2023/Apr/14/worst-that-can-happen/
Cite this
Qlarify Labs. (2026). Indirect prompt injection via retrieved content. Retrieved from https://labs.qlarify.fi/findings/indirect-prompt-injection