← Methods
BoundaryEmerging

Threshold testing

Walk inputs across a decision boundary — refusal, classification, confidence cutoff — to find exactly where the model's behaviour flips, and whether it flips in the right place.

Published June 26, 2026

How it works

Many AI behaviours hinge on a threshold: refuse versus answer, flag versus allow, escalate versus handle. Threshold testing sweeps inputs from clearly-one-side to clearly-the-other and locates the transition, then asks whether it sits where policy intends. It surfaces both over-refusal (the boundary set too tight, blocking benign requests) and under-refusal (set too loose), and the unstable middle band where small changes flip the verdict.

When to use it

Tuning and auditing safety filters, content classifiers, and any allow/deny or confidence cutoff; diagnosing over- and under-refusal.

Limitations

Boundaries shift between versions and have to be re-mapped after upgrades, and a single threshold can hide very different behaviour across different request types.

Method yield

Findings
1
Versions spanned
4
Yield score
3
1 Medium

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (1)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Threshold testing. Retrieved from https://labs.qlarify.fi/methods/threshold-testing