← Reference library
PaperHigh credibilityarXiv:2311.17035 · Milad Nasr, Nicholas Carlini, et al. · November 28, 2023

Scalable Extraction of Training Data from (Production) Language Models

Our summary

Shows that prompting aligned ChatGPT to endlessly repeat a token makes it diverge from chat-style output and emit memorized training data — including PII — at ~150x the normal rate, recovering over ten thousand unique training examples for about $200.

Why it matters

Both a memorization/privacy failure and a model-extraction surface: a deployed, aligned model leaks its own training corpus under the right probe.

Cited by these methods

Related findings (1)

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Scalable Extraction of Training Data from (Production) Language Models. Retrieved from https://labs.qlarify.fi/references/scalable-extraction-training-data-2023