← Methods
DifferentialEstablished

Differential testing

Run the same input across models or versions and treat divergence as a signal worth investigating.

Published June 26, 2026

How it works

When two comparable systems disagree on the same input, at least one is wrong — or the behavior is unstable. Comparing across providers, or across versions of one model, surfaces regressions and version-specific quirks cheaply, again without a labelled oracle.

When to use it

Regression testing across model upgrades; provider selection; flagging unstable behaviors.

Limitations

Agreement doesn't prove correctness (both can be wrong the same way). Best combined with an oracle on the divergent cases.

Method yield

Findings
7
Versions spanned
7
Yield score
21
1 High5 Medium1 Low

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (7)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Differential testing. Retrieved from https://labs.qlarify.fi/methods/differential-testing