PaperHigh credibilityarXiv:2501.12948 · DeepSeek-AI · January 22, 2025

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Our summary

DeepSeek's technical report for R1, an open-weight reasoning model trained largely via reinforcement learning. Notably candid: its 'Limitations' section enumerates concrete, testable weaknesses (language mixing, prompt sensitivity, and a tool-use regression relative to the base V3 model).

Why it matters

A rare case of a vendor naming specific, reproducible limitations of its own model — primary-source material for vendor-acknowledged findings.

Cited by these methods

🔬 Differential testing 🔬 Integration testing (MCP handshakes & tool contracts)

Related findings (3)

Published June 26, 2026

Cite this

Qlarify Labs. (2026). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Retrieved from https://labs.qlarify.fi/references/deepseek-r1-paper-2025