LowOtherVendor-acknowledgedPublished

Reasoning model mixes languages on non-English/Chinese queries

DeepSeek-R1 is optimized for English and Chinese and can mix languages mid-output on queries in other languages — its own paper flags this.

Published June 26, 2026

Reproducibility: Sometimes
Severity: Low
Confidence: Vendor-acknowledged

Details

DeepSeek's R1 report states the model 'is currently optimized for Chinese and English, which may result in language mixing issues when handling queries in other languages.' Reinforcement-learning training surfaced language mixing, and adding a language-consistency reward traded off reasoning performance. A reasoning chain that drifts between languages is a concrete reliability and usability failure for multilingual deployments.

Found with

🔬 Differential testing

The same query posed in different languages diverges — or the output mixes languages.

🔬 Metamorphic testing

Translating the query should not change the language consistency of the answer; when it does, the relation is violated.

Evidence

https://arxiv.org/abs/2501.12948

DeepSeek-AI, 'DeepSeek-R1' (2025), Limitations section.

Affected versions

DeepSeek · deepseek-r1

References

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Reasoning failure Multilingual

Source: https://arxiv.org/abs/2501.12948

Cite this

Qlarify Labs. (2026). Reasoning model mixes languages on non-English/Chinese queries. Retrieved from https://labs.qlarify.fi/findings/deepseek-r1-language-mixing