Threshold testing
Walk inputs across a decision boundary — refusal, classification, confidence cutoff — to find exactly where the model's behaviour flips, and whether it flips in the right place.
Published June 26, 2026
How it works
Many AI behaviours hinge on a threshold: refuse versus answer, flag versus allow, escalate versus handle. Threshold testing sweeps inputs from clearly-one-side to clearly-the-other and locates the transition, then asks whether it sits where policy intends. It surfaces both over-refusal (the boundary set too tight, blocking benign requests) and under-refusal (set too loose), and the unstable middle band where small changes flip the verdict.
When to use it
Tuning and auditing safety filters, content classifiers, and any allow/deny or confidence cutoff; diagnosing over- and under-refusal.
Limitations
Boundaries shift between versions and have to be re-mapped after upgrades, and a single threshold can hide very different behaviour across different request types.
Method yield
- Findings
- 1
- Versions spanned
- 4
- Yield score
- 3
Severity-weighted across the published findings below. Why we measure this →
Findings it surfaces (1)
Documented failures this method catches — the evidence it works.
References & further reading
Cite this
Qlarify Labs. (2026). Threshold testing. Retrieved from https://labs.qlarify.fi/methods/threshold-testing