DeepSeek

DeepSeek-R1

DeepSeek's open-weight reasoning model, built on V3 via reinforcement learning and released January 2025. Its own technical report is unusually candid about limitations — captured below as vendor-acknowledged findings.

Attribution note. These are documented failure-mode classesobserved across frontier models and grounded in each finding's cited source — their attribution to this specific version is illustrative. Qlarify Labs has not independently reproduced each finding on DeepSeek-R1; per-version confidence requires reproduction (VERIFICATION §2–4). Open any finding to see its source.

Report card

Auto-derived from 3 linked findings (illustrative version attributions — see note above) — worst severity per category.

Reasoning: Medium1×
Tool use: Medium1×
Other: Low1×

Strengths

Strong chain-of-thought reasoning on math and code; open weights and an open training recipe; competitive with closed reasoning models at lower cost.

Known weaknesses

Per DeepSeek's own R1 paper: optimized for English/Chinese (language mixing on other languages), sensitive to prompts (few-shot degrades it), and weaker than V3 on function calling, multi-turn and JSON output (partly restored in R1-0528). See the linked vendor-acknowledged findings.

Findings (3)

Methods that surface these

🔬 Differential testing 🔬 Integration testing (MCP handshakes & tool contracts)🔬 Metamorphic testing 🔬 Perturbation testing 🔬 Property-based testing

Related references

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — arXiv:2501.12948

Versions tracked

deepseek-r1deepseek-r1-0528

Cite this

Qlarify Labs. (2026). DeepSeek DeepSeek-R1 — known weaknesses. Retrieved from https://labs.qlarify.fi/models/deepseek-r1