← Models & AI tech

Anthropic

Claude Opus

Anthropic's most capable model line, aimed at complex reasoning, coding and long-document work. The known-weaknesses below are drawn from the linked findings and reflect documented failure-mode classes, not exhaustive testing of every version.

Attribution note. These are documented failure-mode classesobserved across frontier models and grounded in each finding's cited source — their attribution to this specific version is illustrative. Qlarify Labs has not independently reproduced each finding on Claude Opus; per-version confidence requires reproduction (VERIFICATION §2–4). Open any finding to see its source.

Report card

Auto-derived from 23 linked findings (illustrative version attributions — see note above) — worst severity per category.

Safety
Critical1×
Prompt injection
Critical1×
Hallucination
High4×
Bias
High2×
Jailbreak
High2×
Tool use
High1×
Reasoning
Medium10×
Refusal
Medium1×
Other
Medium1×

Strengths

Strong multi-step reasoning and coding; comparatively calibrated refusals; long-context handling.

Known weaknesses

Shares the frontier-wide arithmetic, counting and spatial-reasoning limits; susceptible to sycophancy under user pressure and to prompt injection in agentic settings.

Findings (23)

Methods that surface these

Related references

Versions tracked

claude-opus-4-6claude-opus-4-7claude-opus-4-8

Cite this

Qlarify Labs. (2026). Anthropic Claude Opus — known weaknesses. Retrieved from https://labs.qlarify.fi/models/claude-opus