Testing & Findings

Findings

Documented limitations, weaknesses and failures of AI systems — evidence-first and linked to the method that found each one. Public entries are reviewed before publishing.

CategorySeverityMethodTagModelSearchClear

5 findings

CriticalSafetyReviewer-confirmedRepro: Rare
Data exfiltration through prompt injection in agents
An injected instruction can make a tool-using agent send private data to an attacker-controlled destination.
🔬 Prompt-injection & jailbreak testingPrompt injectionSafetyAgents
CriticalPrompt injectionReviewer-confirmedRepro: Sometimes
Indirect prompt injection via retrieved content
Instructions hidden in documents, web pages or tool outputs can override the system prompt when ingested by the model.
🔬 Prompt-injection & jailbreak testingPrompt injectionSafetyRAG
HighTool useReviewer-confirmedRepro: Sometimes
Hallucinated tool/function arguments
When calling tools, models invent argument values or call functions that weren't provided.
🔬 Property-based testing🔬 Differential testing🔬 Integration testing (MCP handshakes & tool contracts)Tool useAgents
MediumTool useReviewer-confirmedRepro: Sometimes
Format-constraint violations under strict schemas
Asked for strictly-formatted output (e.g. JSON to a schema), models emit invalid or extra content.
🔬 Property-based testing🔬 Boundary & edge-case testing🔬 Unit testing the deterministic scaffold🔬 Smoke testing in CI/CDTool useAgents
MediumTool useVendor-acknowledgedRepro: Often
Reasoning model regresses on tool use versus its base model
DeepSeek-R1 falls short of the base DeepSeek-V3 on function calling, multi-turn, complex role-play and JSON output — a reasoning-tuned model trading away tool-use reliability, later restored in R1-0528.
🔬 Integration testing (MCP handshakes & tool contracts)🔬 Property-based testing🔬 Differential testingTool useAgents

Data exfiltration through prompt injection in agents

Indirect prompt injection via retrieved content

Hallucinated tool/function arguments

Format-constraint violations under strict schemas

Reasoning model regresses on tool use versus its base model