LowReasoningReviewer-confirmedPublished

Character-counting errors in tokenized words

Models miscount letters within a word (e.g. how many 'r's are in a given word) because they reason over tokens, not characters.

Published June 26, 2026

Reproducibility: Often
Severity: Low
Confidence: Reviewer-confirmed

Details

Because text is processed as sub-word tokens, models lack reliable access to individual characters. Asking for letter counts, reversing strings, or character-level edits produces confident but wrong answers. The error is structural to tokenization, not a knowledge gap.

Found with

🔬 Boundary & edge-case testing

Longer words with repeated letters increase failure rate.

🔬 Perturbation testing

Perturbing the target word (case, spacing, repeated letters) shows the count is not robust to trivial, meaning-preserving changes.

Evidence

Q: How many times does the letter "r" appear in "strawberry"?
A: The letter "r" appears 2 times. (Correct answer: 3)

Illustrative example — see the linked reference for the documented evidence.

Affected versions

Anthropic · claude-opus-4-8Anthropic · claude-sonnet-4-6OpenAI · gpt-4oGoogle · gemini-2.0-flashMeta · llama-3.3-70bMistral · mistral-large-2

References

Why Do Large Language Models (LLMs) Struggle to Count Letters?

Reasoning failure

Source: https://arxiv.org/abs/2412.18626

Cite this

Qlarify Labs. (2026). Character-counting errors in tokenized words. Retrieved from https://labs.qlarify.fi/findings/character-counting-errors