Needle-in-a-haystack (long-context retrieval)
Plant a specific fact at varying depths in a long context and test whether the model can retrieve it from each position.
Published June 26, 2026
How it works
By inserting a unique 'needle' at different positions across increasingly long contexts and asking the model to recall it, you map retrieval reliability as a function of context length and position. It reliably exposes the 'lost in the middle' degradation and the true (vs. advertised) usable context length.
When to use it
Evaluating long-context claims; RAG and document-QA systems.
Limitations
A synthetic needle is easier than real reasoning over long context; passing it is necessary, not sufficient.
Method yield
- Findings
- 1
- Versions spanned
- 4
- Yield score
- 3
Severity-weighted across the published findings below. Why we measure this →
Findings it surfaces (1)
Documented failures this method catches — the evidence it works.
References & further reading
Cite this
Qlarify Labs. (2026). Needle-in-a-haystack (long-context retrieval). Retrieved from https://labs.qlarify.fi/methods/needle-in-haystack