← Findings
MediumHallucinationReviewer-confirmedPublished

Poor uncertainty calibration / overconfidence

Stated confidence does not track accuracy; models sound equally certain when right and wrong.

Published June 26, 2026

Reproducibility
Often
Severity
Medium
Confidence
Reviewer-confirmed

Details

Verbalized confidence is weakly correlated with correctness, and probabilities are often miscalibrated. Users cannot rely on tone or stated certainty to gauge trustworthiness.

Found with

Evidence

Model reports 'I'm certain' on an answer that is wrong and that changes on resampling.
Illustrative example — see the linked reference for the documented evidence.

Affected versions

Anthropic · claude-opus-4-8OpenAI · gpt-4oGoogle · gemini-2.0-flashMeta · llama-3.3-70b

References

Source: https://arxiv.org/abs/2306.13063

Cite this

Qlarify Labs. (2026). Poor uncertainty calibration / overconfidence. Retrieved from https://labs.qlarify.fi/findings/overconfidence-calibration