PaperHigh credibilityACL 2022 (arXiv:2109.07958) · Stephanie Lin, Jacob Hilton, Owain Evans · September 8, 2021

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Our summary

A benchmark of questions deliberately crafted to invite false answers rooted in common misconceptions, on which the best model was truthful only 58% of the time against 94% for humans.

Why it matters

The template for hallucination triggering — purpose-built prompts that steer a model toward confident fabrication so you can map where it happens.

Cited by these methods

🔬 Hallucination triggering

Published June 26, 2026

Cite this

Qlarify Labs. (2026). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Retrieved from https://labs.qlarify.fi/references/truthfulqa-2022