← Findings
MediumReasoningVendor-acknowledgedPublished

Reasoning model degrades under few-shot prompting

DeepSeek-R1's own paper reports that few-shot prompting 'consistently degrades its performance' and recommends zero-shot — inverting the usual assumption that examples help.

Published June 26, 2026

Reproducibility
Often
Severity
Medium
Confidence
Vendor-acknowledged

Details

DeepSeek's R1 report states the model 'is sensitive to prompts' and that 'few-shot prompting consistently degrades its performance,' recommending a zero-shot setting for optimal results. This inverts a common prompting habit and makes the model brittle to prompt format — exactly the kind of fragility perturbation testing is built to quantify.

Found with

Evidence

https://arxiv.org/abs/2501.12948
DeepSeek-AI, 'DeepSeek-R1' (2025), Limitations section.

Affected versions

DeepSeek · deepseek-r1

References

Source: https://arxiv.org/abs/2501.12948

Cite this

Qlarify Labs. (2026). Reasoning model degrades under few-shot prompting. Retrieved from https://labs.qlarify.fi/findings/deepseek-r1-prompt-sensitivity