Testing & Findings
Testing methods
How to find the limits of AI systems. Each method is backed by the real findings it has surfaced — and ranked by how much it surfaces, weighted by severity. The durable knowledge is the technique, not the patched-away bug.
Most productive methods
- 1Differential testing7 findings · 7 versions
- 2Prompt-injection & jailbreak testing4 findings · 4 versions
- 3Boundary & edge-case testing7 findings · 6 versions
Ranked by a severity-weighted yield score. Why we measure this →
5 methods
Chaos engineering for AI systems
Deliberately inject failures — tool timeouts, malformed tool responses, truncated context, adversarial inputs — to test whether the system degrades gracefully and recovers.
Distillation & model-extraction probing
Probe whether a deployed model can be cheaply queried to reconstruct its behaviour, training data, or a usable distilled copy — a confidentiality and IP attack surface.
Hallucination triggering
Deliberately steer the model toward fabrication — asking about non-existent entities or beyond its knowledge — to map where it invents instead of declining.
Perturbation testing
Apply small, meaning-preserving changes to an input — typos, spacing, paraphrase, reordering — and check that the output stays stable. When it doesn't, you've measured brittleness.
Threshold testing
Walk inputs across a decision boundary — refusal, classification, confidence cutoff — to find exactly where the model's behaviour flips, and whether it flips in the right place.