Anthropic

Claude Sonnet

Anthropic's balanced Claude tier — strong capability at lower latency and cost than Opus. Linked findings reflect documented frontier failure-mode classes; per-version attribution is illustrative.

Attribution note. These are documented failure-mode classesobserved across frontier models and grounded in each finding's cited source — their attribution to this specific version is illustrative. Qlarify Labs has not independently reproduced each finding on Claude Sonnet; per-version confidence requires reproduction (VERIFICATION §2–4). Open any finding to see its source.

Report card

Auto-derived from 7 linked findings (illustrative version attributions — see note above) — worst severity per category.

Hallucination: High1×
Reasoning: Medium4×
Refusal: Medium1×
Bias: Medium1×

Strengths

Fast, capable general reasoning and coding; good instruction-following and comparatively calibrated refusals for its tier.

Known weaknesses

Shares the frontier-wide arithmetic, counting and tokenization limits; susceptible to sycophancy under user pressure and to prompt injection in agentic settings.

Findings (7)

Methods that surface these

🔬 A/B testing in production 🔬 Adversarial prompting 🔬 Benchmark evaluation 🔬 Boundary & edge-case testing 🔬 Counterfactual bias probing 🔬 Differential testing 🔬 Factual oracle verification 🔬 Hallucination triggering 🔬 Logic & consistency testing 🔬 Metamorphic testing 🔬 Perturbation testing 🔬 Property-based testing 🔬 Self-consistency probing 🔬 Threshold testing

Related references

Faith and Fate: Limits of Transformers on Compositionality — arXiv
Hallucination Detection in Large Language Models with Metamorphic Relations — arXiv
Metamorphic Testing of Large Language Models for Natural Language Processing — arXiv
Survey of Hallucination in Natural Language Generation — ACM Computing Surveys
Towards Understanding Sycophancy in Language Models — arXiv
TruthfulQA: Measuring How Models Mimic Human Falsehoods — arXiv
Why Do Large Language Models (LLMs) Struggle to Count Letters? — arXiv
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models — arXiv

Versions tracked

claude-sonnet-4-6

Cite this

Qlarify Labs. (2026). Anthropic Claude Sonnet — known weaknesses. Retrieved from https://labs.qlarify.fi/models/claude-sonnet