← Reference library
PaperHigh credibilityarXiv · Fu et al. · December 1, 2024

Why Do Large Language Models (LLMs) Struggle to Count Letters?

Our summary

Analyses the well-known failure to count letters in a word (e.g. the r's in 'strawberry'), tying it to byte-pair tokenization: characters are grouped into tokens, so the unit being counted is not the unit the model processes. Reported letter-counting accuracy is very low, especially when a letter recurs.

Why it matters

A clean, tokenizer-rooted example of how sub-word representation removes reliable access to individual characters — a structural limit, not a prompting mistake.

Related findings (1)

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Why Do Large Language Models (LLMs) Struggle to Count Letters?. Retrieved from https://labs.qlarify.fi/references/llms-struggle-count-letters-2024