PaperHigh credibilityarXiv · Lanham et al. (Anthropic) · July 17, 2023

Measuring Faithfulness in Chain-of-Thought Reasoning

Our summary

Tests whether a model's stated chain-of-thought actually drives its answer, finding that reasoning is often unfaithful: answers can be unchanged when the reasoning is perturbed, or swayed by biasing cues the model never mentions.

Why it matters

Cautions against trusting CoT as a transparent explanation — the visible reasoning may be post-hoc.

Cited by these methods

🔬 Chain-of-thought faithfulness probing

Related findings (1)

Unfaithful chain-of-thought reasoningMedium
The stated step-by-step reasoning does not reflect the actual cause of the answer.

Reasoning failure Safety Evals

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Measuring Faithfulness in Chain-of-Thought Reasoning. Retrieved from https://labs.qlarify.fi/references/measuring-faithfulness-chain-of-thought