← Findings
MediumOtherReviewer-confirmedPublished

Unfaithful chain-of-thought reasoning

The stated step-by-step reasoning does not reflect the actual cause of the answer.

Published June 26, 2026

Reproducibility
Sometimes
Severity
Medium
Confidence
Reviewer-confirmed

Details

Models can be steered by biasing hints while presenting reasoning that omits the real cause, or reach the same answer regardless of their stated steps. Explanations therefore cannot be assumed to be faithful accounts of the computation.

Found with

Evidence

https://arxiv.org/abs/2305.04388
Turpin et al., 'Language Models Don't Always Say What They Think' (2023)

Affected versions

Anthropic · claude-opus-4-8OpenAI · gpt-4oOpenAI · o3

References

Source: https://arxiv.org/abs/2307.13702

Cite this

Qlarify Labs. (2026). Unfaithful chain-of-thought reasoning. Retrieved from https://labs.qlarify.fi/findings/cot-unfaithfulness