← Reference library
PaperHigh credibilityarXiv · Fu et al. · December 1, 2024
Why Do Large Language Models (LLMs) Struggle to Count Letters?
Our summary
Analyses the well-known failure to count letters in a word (e.g. the r's in 'strawberry'), tying it to byte-pair tokenization: characters are grouped into tokens, so the unit being counted is not the unit the model processes. Reported letter-counting accuracy is very low, especially when a letter recurs.
Why it matters
A clean, tokenizer-rooted example of how sub-word representation removes reliable access to individual characters — a structural limit, not a prompting mistake.
Related findings (1)
Published June 26, 2026
Cite this
Qlarify Labs. (2026). Why Do Large Language Models (LLMs) Struggle to Count Letters?. Retrieved from https://labs.qlarify.fi/references/llms-struggle-count-letters-2024