Production model internals extracted through the API
With ordinary black-box API access, researchers recovered the embedding projection layer and hidden dimension of production models for under $20 — the first precise model-stealing attack on deployed LLMs.
Published June 26, 2026
- Reproducibility
- Often
- Severity
- High
- Confidence
- Reviewer-confirmed
Details
Carlini et al. demonstrated a model-extraction attack that, from query access alone, recovers the final embedding projection matrix of black-box production models and reveals their hidden dimension — extracting it from OpenAI's Ada and Babbage for under $20, with OpenAI's approval and subsequent mitigation. It establishes that a deployed model's API is itself a leak surface for proprietary internals, the exact risk model-extraction probing is meant to quantify.
Found with
Evidence
References
Source: https://arxiv.org/abs/2403.06634
Cite this
Qlarify Labs. (2026). Production model internals extracted through the API. Retrieved from https://labs.qlarify.fi/findings/production-model-extraction