PaperHigh credibilityarXiv:2311.17035 · Milad Nasr, Nicholas Carlini, et al. · November 28, 2023

Scalable Extraction of Training Data from (Production) Language Models

Our summary

Shows that prompting aligned ChatGPT to endlessly repeat a token makes it diverge from chat-style output and emit memorized training data — including PII — at ~150x the normal rate, recovering over ten thousand unique training examples for about $200.

Why it matters

Both a memorization/privacy failure and a model-extraction surface: a deployed, aligned model leaks its own training corpus under the right probe.

Cited by these methods

🔬 Distillation & model-extraction probing

Related findings (1)

Verbatim training data extracted from a deployed chatbotHigh
A 'divergence' attack made aligned ChatGPT abandon its chat format and emit memorized training data verbatim, recovering thousands of examples for about $200.

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Scalable Extraction of Training Data from (Production) Language Models. Retrieved from https://labs.qlarify.fi/references/scalable-extraction-training-data-2023