PaperHigh credibilityarXiv · Fu et al. · December 1, 2024

Why Do Large Language Models (LLMs) Struggle to Count Letters?

Our summary

Analyses the well-known failure to count letters in a word (e.g. the r's in 'strawberry'), tying it to byte-pair tokenization: characters are grouped into tokens, so the unit being counted is not the unit the model processes. Reported letter-counting accuracy is very low, especially when a letter recurs.

Why it matters

A clean, tokenizer-rooted example of how sub-word representation removes reliable access to individual characters — a structural limit, not a prompting mistake.

Related findings (1)

Character-counting errors in tokenized wordsLow
Models miscount letters within a word (e.g. how many 'r's are in a given word) because they reason over tokens, not characters.

Reasoning failure Evals

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Why Do Large Language Models (LLMs) Struggle to Count Letters?. Retrieved from https://labs.qlarify.fi/references/llms-struggle-count-letters-2024