Testing & Findings
Findings
Documented limitations, weaknesses and failures of AI systems — evidence-first and linked to the method that found each one. Public entries are reviewed before publishing.
2 findings
- HighJailbreakReviewer-confirmedRepro: Sometimes
Roleplay-based safety bypass
Framing a disallowed request as fiction or a persona can induce the model to bypass its safety policy.
🔬 Prompt-injection & jailbreak testing🔬 Adversarial promptingJailbreakSafety - HighJailbreakReviewer-confirmedRepro: Sometimes
Safety bypass via unicode/homoglyph obfuscation
Disallowed content encoded with look-alike unicode or spacing can slip past safety filters.
🔬 Glitch-token & unicode fuzzing🔬 Prompt-injection & jailbreak testingJailbreakSafety