← Reference library
PaperHigh credibilityNeurIPS 2019 (arXiv:1810.11953) · Stephan Rabanser, Stephan Günnemann, Zachary C. Lipton · October 29, 2018
Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift
Our summary
A systematic comparison of two-sample statistical tests for detecting when a system's input or output distribution has shifted, including how to identify and quantify the shift.
Why it matters
The methodological backbone of distributional testing — judging a stochastic system by its output distribution, not any single sample.
Cited by these methods
Published June 26, 2026
Cite this
Qlarify Labs. (2026). Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. Retrieved from https://labs.qlarify.fi/references/failing-loudly-dataset-shift-2019