PaperHigh credibilityICML 2024 (arXiv:2403.06634) · Nicholas Carlini, Daniel Paleka, et al. · March 9, 2024

Stealing Part of a Production Language Model

Our summary

The first precise model-extraction attack on deployed LLMs: from ordinary black-box API access it recovers the final embedding projection layer and the hidden dimension of production models, extracting them from OpenAI's Ada and Babbage for under $20 (with OpenAI's approval and subsequent mitigation).

Why it matters

Establishes that a deployed model's API is itself a leak surface for proprietary internals — the exact confidentiality risk model-extraction probing is meant to quantify.

Cited by these methods

🔬 Distillation & model-extraction probing

Related findings (1)

Production model internals extracted through the APIHigh
With ordinary black-box API access, researchers recovered the embedding projection layer and hidden dimension of production models for under $20 — the first precise model-stealing attack on deployed LLMs.

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Stealing Part of a Production Language Model. Retrieved from https://labs.qlarify.fi/references/stealing-part-production-lm-2024