DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Our summary
DeepSeek's technical report for R1, an open-weight reasoning model trained largely via reinforcement learning. Notably candid: its 'Limitations' section enumerates concrete, testable weaknesses (language mixing, prompt sensitivity, and a tool-use regression relative to the base V3 model).
Why it matters
A rare case of a vendor naming specific, reproducible limitations of its own model — primary-source material for vendor-acknowledged findings.
Cited by these methods
Related findings (3)
- Reasoning model mixes languages on non-English/Chinese queriesLow
DeepSeek-R1 is optimized for English and Chinese and can mix languages mid-output on queries in other languages — its own paper flags this.
- Reasoning model degrades under few-shot promptingMedium
DeepSeek-R1's own paper reports that few-shot prompting 'consistently degrades its performance' and recommends zero-shot — inverting the usual assumption that examples help.
- Reasoning model regresses on tool use versus its base modelMedium
DeepSeek-R1 falls short of the base DeepSeek-V3 on function calling, multi-turn, complex role-play and JSON output — a reasoning-tuned model trading away tool-use reliability, later restored in R1-0528.
Published June 26, 2026
Cite this
Qlarify Labs. (2026). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Retrieved from https://labs.qlarify.fi/references/deepseek-r1-paper-2025