PaperHigh credibilityNeurIPS 2019 (arXiv:1810.11953) · Stephan Rabanser, Stephan Günnemann, Zachary C. Lipton · October 29, 2018

Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

Read the original ↗Archived copy

Our summary

A systematic comparison of two-sample statistical tests for detecting when a system's input or output distribution has shifted, including how to identify and quantify the shift.

Why it matters

The methodological backbone of distributional testing — judging a stochastic system by its output distribution, not any single sample.

Cited by these methods

🔬 Distributional testing (KS test, Monte Carlo)

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. Retrieved from https://labs.qlarify.fi/references/failing-loudly-dataset-shift-2019