PaperHigh credibilityarXiv · Xiong et al. · June 1, 2023

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Our summary

An empirical evaluation of confidence elicitation across LLMs, finding that models verbalize high confidence even when wrong and are generally overconfident — plausibly imitating human patterns of asserting certainty.

Why it matters

Stated certainty is a poor signal of correctness, so any system that gates on a model's self-reported confidence inherits that miscalibration.

Related findings (1)

Poor uncertainty calibration / overconfidenceMedium
Stated confidence does not track accuracy; models sound equally certain when right and wrong.

Hallucination Reasoning failure Evals

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. Retrieved from https://labs.qlarify.fi/references/llm-confidence-elicitation-2023