← Reference library
PaperHigh credibilityACL 2022 (arXiv:2109.07958) · Stephanie Lin, Jacob Hilton, Owain Evans · September 8, 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Our summary
A benchmark of questions deliberately crafted to invite false answers rooted in common misconceptions, on which the best model was truthful only 58% of the time against 94% for humans.
Why it matters
The template for hallucination triggering — purpose-built prompts that steer a model toward confident fabrication so you can map where it happens.
Cited by these methods
Published June 26, 2026
Cite this
Qlarify Labs. (2026). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Retrieved from https://labs.qlarify.fi/references/truthfulqa-2022