Testing & Findings
Findings
Documented limitations, weaknesses and failures of AI systems — evidence-first and linked to the method that found each one. Public entries are reviewed before publishing.
2 findings
- CriticalSafetyReviewer-confirmedRepro: Rare
Data exfiltration through prompt injection in agents
An injected instruction can make a tool-using agent send private data to an attacker-controlled destination.
🔬 Prompt-injection & jailbreak testingPrompt injectionSafetyAgents - CriticalPrompt injectionReviewer-confirmedRepro: Sometimes
Indirect prompt injection via retrieved content
Instructions hidden in documents, web pages or tool outputs can override the system prompt when ingested by the model.
🔬 Prompt-injection & jailbreak testingPrompt injectionSafetyRAG