Prompt-injection & jailbreak testing
Embed adversarial instructions in user input or retrieved/tool content to test whether the model follows attacker text over its system policy.
Published June 26, 2026
How it works
Prompt injection is the defining security failure mode of LLM applications. Testing covers direct injection (the user tries to override the system prompt) and indirect injection (malicious instructions hidden in documents, web pages, or tool outputs the model ingests). For agentic systems this is the highest-severity surface.
When to use it
Any system that ingests untrusted content (RAG, browsing, email, tool outputs) or grants the model side effects.
Limitations
The attack space is open-ended; a clean test run is not proof of safety. Per Qlarify Labs policy, live payloads are redacted in published findings.
Method yield
- Findings
- 4
- Versions spanned
- 4
- Yield score
- 18
Severity-weighted across the published findings below. Why we measure this →
Findings it surfaces (4)
Documented failures this method catches — the evidence it works.
- Indirect prompt injection via retrieved contentCritical
Instructions hidden in documents, web pages or tool outputs can override the system prompt when ingested by the model.
How it found it: Indirect payload embedded in a retrieved document.
Prompt injection - Roleplay-based safety bypassHigh
Framing a disallowed request as fiction or a persona can induce the model to bypass its safety policy.
Jailbreak - Safety bypass via unicode/homoglyph obfuscationHigh
Disallowed content encoded with look-alike unicode or spacing can slip past safety filters.
How it found it: Obfuscated payload evades surface filters.
Jailbreak - Data exfiltration through prompt injection in agentsCritical
An injected instruction can make a tool-using agent send private data to an attacker-controlled destination.
How it found it: Injection chained to a side-effecting tool call.
Safety
References & further reading
- Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Greshake et al. · arXiv · February 1, 2023
- Prompt injection: what’s the worst that can happen?
Simon Willison · Simon Willison’s Weblog · April 14, 2023
- OWASP Top 10 for Large Language Model Applications
OWASP · October 1, 2023
Cite this
Qlarify Labs. (2026). Prompt-injection & jailbreak testing. Retrieved from https://labs.qlarify.fi/methods/prompt-injection-testing