← Reference library
PaperHigh credibilityarXiv:2307.09009 · Lingjiao Chen, Matei Zaharia, James Zou · July 18, 2023
How Is ChatGPT's Behavior Changing over Time?
Our summary
Evaluates GPT-3.5 and GPT-4 on identical tasks across two 2023 snapshots and finds large, undirected swings — most starkly, GPT-4's accuracy at identifying prime vs. composite numbers fell from 84% to 51% in a few months, alongside degraded instruction-following.
Why it matters
Hard evidence that a hosted model's capability is not monotonic and can silently regress between releases — the case for re-running a fixed suite over time rather than measuring once.
Cited by these methods
Related findings (1)
Published June 26, 2026
Cite this
Qlarify Labs. (2026). How Is ChatGPT's Behavior Changing over Time?. Retrieved from https://labs.qlarify.fi/references/chatgpt-behavior-drift-2023