← Reference library
PaperHigh credibilityarXiv · Lanham et al. (Anthropic) · July 17, 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Our summary
Tests whether a model's stated chain-of-thought actually drives its answer, finding that reasoning is often unfaithful: answers can be unchanged when the reasoning is perturbed, or swayed by biasing cues the model never mentions.
Why it matters
Cautions against trusting CoT as a transparent explanation — the visible reasoning may be post-hoc.
Cited by these methods
Related findings (1)
Published June 26, 2026
Cite this
Qlarify Labs. (2026). Measuring Faithfulness in Chain-of-Thought Reasoning. Retrieved from https://labs.qlarify.fi/references/measuring-faithfulness-chain-of-thought