Distributional testing (KS test, Monte Carlo)
Sample the model many times and test the distribution of its outputs — not any single answer — for drift, miscalibration, or instability.
Published June 26, 2026
How it works
A non-deterministic system has to be judged statistically. By sampling repeatedly and treating the outputs as a distribution, you can ask sharper questions: has this version's output distribution shifted from the last one (a two-sample Kolmogorov–Smirnov test), does stated confidence track actual accuracy, how wide is the spread of a metric under resampling (Monte Carlo). It turns stochasticity from a nuisance into a measurable property — the natural lens for stability, longevity, and decay.
When to use it
Calibration studies; detecting distribution shift between versions; quantifying variance in a metric; any claim about stability rather than a single output.
Limitations
Needs enough samples to be meaningful, and a shifted distribution tells you behaviour changed, not whether it changed for the better. Choosing the right statistic and threshold is itself a modelling decision.
Method yield
- Findings
- 1
- Versions spanned
- 4
- Yield score
- 3
Severity-weighted across the published findings below. Why we measure this →
Findings it surfaces (1)
Documented failures this method catches — the evidence it works.
References & further reading
Cite this
Qlarify Labs. (2026). Distributional testing (KS test, Monte Carlo). Retrieved from https://labs.qlarify.fi/methods/distributional-testing