MediumOtherReviewer-confirmedPublished

Unfaithful chain-of-thought reasoning

The stated step-by-step reasoning does not reflect the actual cause of the answer.

Published June 26, 2026

Reproducibility: Sometimes
Severity: Medium
Confidence: Reviewer-confirmed

Details

Models can be steered by biasing hints while presenting reasoning that omits the real cause, or reach the same answer regardless of their stated steps. Explanations therefore cannot be assumed to be faithful accounts of the computation.

Found with

🔬 Chain-of-thought faithfulness probing

Inject a hint; answer shifts but reasoning never mentions it.

Evidence

https://arxiv.org/abs/2305.04388

Turpin et al., 'Language Models Don't Always Say What They Think' (2023)

Affected versions

Anthropic · claude-opus-4-8OpenAI · gpt-4oOpenAI · o3

References

Measuring Faithfulness in Chain-of-Thought Reasoning

Reasoning failure Evals

Source: https://arxiv.org/abs/2307.13702

Cite this

Qlarify Labs. (2026). Unfaithful chain-of-thought reasoning. Retrieved from https://labs.qlarify.fi/findings/cot-unfaithfulness