← Methods
BoundaryEstablished

Needle-in-a-haystack (long-context retrieval)

Plant a specific fact at varying depths in a long context and test whether the model can retrieve it from each position.

Published June 26, 2026

How it works

By inserting a unique 'needle' at different positions across increasingly long contexts and asking the model to recall it, you map retrieval reliability as a function of context length and position. It reliably exposes the 'lost in the middle' degradation and the true (vs. advertised) usable context length.

When to use it

Evaluating long-context claims; RAG and document-QA systems.

Limitations

A synthetic needle is easier than real reasoning over long context; passing it is necessary, not sufficient.

Method yield

Findings
1
Versions spanned
4
Yield score
3
1 Medium

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (1)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Needle-in-a-haystack (long-context retrieval). Retrieved from https://labs.qlarify.fi/methods/needle-in-haystack