Counts of items, occurrences, or matches in long inputs drift as list length grows.
Published June 26, 2026
Reproducibility
Often
Severity
Low
Confidence
Reviewer-confirmed
Details
Asked to count elements satisfying a condition in a long list, models undercount or overcount, with error increasing with length. A boundary-sensitive reliability issue for data-extraction tasks.