Drift & decay monitoring
Re-run a fixed suite against each release and over time, watching for the quiet regressions and capability decay that a one-off evaluation can't see.
Published June 26, 2026
How it works
Models don't only improve — fixes come undone, behaviours shift under silent updates, and accuracy on a task you relied on can decay between releases. Drift monitoring keeps a stable, versioned suite and re-runs it continuously, comparing each result against the baseline so a regression shows up as a trend, not a surprise in production. It is the longitudinal complement to benchmarking: benchmarks tell you where you stand, drift monitoring tells you which way you're moving.
When to use it
Long-lived systems on top of a model you don't control; tracking hosted models that update silently; guarding against regressions resurfacing after a fix.
Limitations
Only as sensitive as the suite it re-runs, and a drift signal flags that something moved without explaining why. Needs a stable baseline and disciplined versioning to avoid false alarms.
Method yield
- Findings
- 2
- Versions spanned
- 1
- Yield score
- 7
Severity-weighted across the published findings below. Why we measure this →
Findings it surfaces (2)
Documented failures this method catches — the evidence it works.
- Model behavior drifts between versions — a fixed task can regressMedium
The same model name can perform very differently across dated snapshots: a task that passed on one release regresses on the next, with no announcement.
How it found it: Re-running the same fixed task (e.g. the prime-number set) against each snapshot is what makes the regression visible as a trend.
Reasoning - A production update made the model sycophantic and was rolled backHigh
An April 2025 GPT-4o update tuned on user feedback became markedly more sycophantic — validating harmful or delusional claims — and was rolled back within days.
How it found it: A deployment eval tracking sycophancy across releases would have flagged it — its absence is precisely why it slipped through.
Bias
References & further reading
- A Survey on Concept Drift Adaptation
João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, Abdelhamid Bouchachia · ACM Computing Surveys 46(4) · April 1, 2014
- How Is ChatGPT's Behavior Changing over Time?
Lingjiao Chen, Matei Zaharia, James Zou · arXiv:2307.09009 · July 18, 2023
- Sycophancy in GPT-4o: What Happened and What We're Doing About It
OpenAI · OpenAI · April 29, 2025
Cite this
Qlarify Labs. (2026). Drift & decay monitoring. Retrieved from https://labs.qlarify.fi/methods/drift-monitoring