LowReasoningReviewer-confirmedPublished

Miscounting items in long lists

Counts of items, occurrences, or matches in long inputs drift as list length grows.

Published June 26, 2026

Reproducibility: Often
Severity: Low
Confidence: Reviewer-confirmed

Details

Asked to count elements satisfying a condition in a long list, models undercount or overcount, with error increasing with length. A boundary-sensitive reliability issue for data-extraction tasks.

Found with

🔬 Boundary & edge-case testing

Error grows with list length.

🔬 Property-based testing

Evidence

Q: How many items in this 40-item list start with a vowel?
A: Confident count off by 2–3.

Illustrative example — see the linked reference for the documented evidence.

Affected versions

Anthropic · claude-opus-4-8OpenAI · gpt-4oGoogle · gemini-2.0-flashMeta · llama-3.3-70b

References

Counting Ability of Large Language Models and Impact of Tokenization

Reasoning failure

Source: https://arxiv.org/abs/2410.19730

Cite this

Qlarify Labs. (2026). Miscounting items in long lists. Retrieved from https://labs.qlarify.fi/findings/list-counting-errors