BoundaryEstablished

Needle-in-a-haystack (long-context retrieval)

Plant a specific fact at varying depths in a long context and test whether the model can retrieve it from each position.

Published June 26, 2026

RAG Context window

How it works

By inserting a unique 'needle' at different positions across increasingly long contexts and asking the model to recall it, you map retrieval reliability as a function of context length and position. It reliably exposes the 'lost in the middle' degradation and the true (vs. advertised) usable context length.

When to use it

Evaluating long-context claims; RAG and document-QA systems.

Limitations

A synthetic needle is easier than real reasoning over long context; passing it is necessary, not sufficient.

Method yield

Findings: 1
Versions spanned: 4
Yield score: 3

1 Medium

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (1)

Documented failures this method catches — the evidence it works.

Lost in the middle: degraded recall for mid-context informationMedium
Retrieval accuracy is highest for facts at the start and end of a long context and drops for facts in the middle.
How it found it: Vary needle depth; accuracy dips mid-context.
Reasoning

References & further reading

Lost in the Middle: How Language Models Use Long Contexts
Liu et al. · arXiv · July 6, 2023

Cite this

Qlarify Labs. (2026). Needle-in-a-haystack (long-context retrieval). Retrieved from https://labs.qlarify.fi/methods/needle-in-haystack