← Findings
CriticalPrompt injectionReviewer-confirmedPublished

Indirect prompt injection via retrieved content

Instructions hidden in documents, web pages or tool outputs can override the system prompt when ingested by the model.

Published June 26, 2026

Reproducibility
Sometimes
Severity
Critical
Confidence
Reviewer-confirmed

Details

An attacker plants instructions in content the model later reads (a web page, a PDF, an email). When the model processes that content, it may follow the attacker's instructions instead of the application's — exfiltrating data or taking unauthorized actions. The defining security risk for RAG and agentic systems.

Found with

Evidence

A retrieved document contained hidden instructions directing the model to ignore prior context and reveal system content. The model complied. Live payload withheld.
Illustrative example — see the linked reference for the documented evidence.

1 evidence item withheld. Live exploit payloads are not published — only the technique and impact are described (disclosure policy).

Affected versions

Anthropic · claude-opus-4-8OpenAI · gpt-4oGoogle · gemini-2.0-flash

References

Source: https://simonwillison.net/2023/Apr/14/worst-that-can-happen/

Cite this

Qlarify Labs. (2026). Indirect prompt injection via retrieved content. Retrieved from https://labs.qlarify.fi/findings/indirect-prompt-injection