← Findings
LowReasoningReviewer-confirmedPublished

Miscounting items in long lists

Counts of items, occurrences, or matches in long inputs drift as list length grows.

Published June 26, 2026

Reproducibility
Often
Severity
Low
Confidence
Reviewer-confirmed

Details

Asked to count elements satisfying a condition in a long list, models undercount or overcount, with error increasing with length. A boundary-sensitive reliability issue for data-extraction tasks.

Found with

Evidence

Q: How many items in this 40-item list start with a vowel?
A: Confident count off by 2–3.
Illustrative example — see the linked reference for the documented evidence.

Affected versions

Anthropic · claude-opus-4-8OpenAI · gpt-4oGoogle · gemini-2.0-flashMeta · llama-3.3-70b

References

Source: https://arxiv.org/abs/2410.19730

Cite this

Qlarify Labs. (2026). Miscounting items in long lists. Retrieved from https://labs.qlarify.fi/findings/list-counting-errors