Testing & Findings
Testing methods
How to find the limits of AI systems. Each method is backed by the real findings it has surfaced — and ranked by how much it surfaces, weighted by severity. The durable knowledge is the technique, not the patched-away bug.
Most productive methods
- 1Differential testing7 findings · 7 versions
- 2Prompt-injection & jailbreak testing4 findings · 4 versions
- 3Boundary & edge-case testing7 findings · 6 versions
Ranked by a severity-weighted yield score. Why we measure this →
5 methods
Adversarial prompting
Deliberately craft inputs designed to elicit failure — confusion, unsafe output, or broken constraints — to map the model's weak boundaries.
Canary releases & staged rollout
Route a small slice of real traffic to a new model or prompt first, watch it closely, and widen or roll back based on what the canary shows.
Distillation & model-extraction probing
Probe whether a deployed model can be cheaply queried to reconstruct its behaviour, training data, or a usable distilled copy — a confidentiality and IP attack surface.
Glitch-token & unicode fuzzing
Feed anomalous tokens, rare unicode, homoglyphs and malformed encodings to trigger out-of-distribution behavior.
Prompt-injection & jailbreak testing
Embed adversarial instructions in user input or retrieved/tool content to test whether the model follows attacker text over its system policy.