Testing & Findings

Findings

Documented limitations, weaknesses and failures of AI systems — evidence-first and linked to the method that found each one. Public entries are reviewed before publishing.

CategorySeverityMethodTagModelSearchClear

2 findings

CriticalSafetyReviewer-confirmedRepro: Rare
Data exfiltration through prompt injection in agents
An injected instruction can make a tool-using agent send private data to an attacker-controlled destination.
🔬 Prompt-injection & jailbreak testingPrompt injectionSafetyAgents
CriticalPrompt injectionReviewer-confirmedRepro: Sometimes
Indirect prompt injection via retrieved content
Instructions hidden in documents, web pages or tool outputs can override the system prompt when ingested by the model.
🔬 Prompt-injection & jailbreak testingPrompt injectionSafetyRAG

Data exfiltration through prompt injection in agents

Indirect prompt injection via retrieved content