← Methods
OtherEmerging

Distributional testing (KS test, Monte Carlo)

Sample the model many times and test the distribution of its outputs — not any single answer — for drift, miscalibration, or instability.

Published June 26, 2026

How it works

A non-deterministic system has to be judged statistically. By sampling repeatedly and treating the outputs as a distribution, you can ask sharper questions: has this version's output distribution shifted from the last one (a two-sample Kolmogorov–Smirnov test), does stated confidence track actual accuracy, how wide is the spread of a metric under resampling (Monte Carlo). It turns stochasticity from a nuisance into a measurable property — the natural lens for stability, longevity, and decay.

When to use it

Calibration studies; detecting distribution shift between versions; quantifying variance in a metric; any claim about stability rather than a single output.

Limitations

Needs enough samples to be meaningful, and a shifted distribution tells you behaviour changed, not whether it changed for the better. Choosing the right statistic and threshold is itself a modelling decision.

Method yield

Findings
1
Versions spanned
4
Yield score
3
1 Medium

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (1)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Distributional testing (KS test, Monte Carlo). Retrieved from https://labs.qlarify.fi/methods/distributional-testing