← Findings
MediumTool useVendor-acknowledgedPublished

Reasoning model regresses on tool use versus its base model

DeepSeek-R1 falls short of the base DeepSeek-V3 on function calling, multi-turn, complex role-play and JSON output — a reasoning-tuned model trading away tool-use reliability, later restored in R1-0528.

Published June 26, 2026

Reproducibility
Often
Severity
Medium
Confidence
Vendor-acknowledged

Details

DeepSeek's R1 report states 'the capabilities of DeepSeek-R1 fall short of DeepSeek-V3 in tasks such as function calling, multi-turn, complex role-playing, and JSON output.' The reasoning-focused training regressed exactly the structured and agentic capabilities the base model had — a documented capability trade-off, which the later R1-0528 update restored by re-adding function calling and JSON output support.

Found with

Evidence

https://arxiv.org/abs/2501.12948
DeepSeek-AI, 'DeepSeek-R1' (2025), Limitations section.

Affected versions

DeepSeek · deepseek-r1

Across model versions

First observed in
DeepSeek-R1 · deepseek-r1
Fixed in
DeepSeek-R1 · deepseek-r1-0528

A finding is a claim about a specific model version at a point in time. Fixes can come undone — the method that found it is how you’d catch it again. Why we track this →

References

Source: https://arxiv.org/abs/2501.12948

Cite this

Qlarify Labs. (2026). Reasoning model regresses on tool use versus its base model. Retrieved from https://labs.qlarify.fi/findings/deepseek-r1-tool-use-regression