← AI tech topics
Refusals & over-caution
Safety training has a false-positive side: models refuse legitimate requests — medical questions, security research, fiction involving conflict — because they pattern-match to something forbidden. Over-refusal is a real quality defect, in tension with jailbreak resistance: tuning toward one moves the other. It is also unevenly distributed across topics and phrasings, so it needs its own evaluation rather than being treated as the safe default. The linked findings document refusals of clearly benign requests.
Findings (1)
Methods
References
Cite this
Qlarify Labs. (2026). Refusals & over-caution. Retrieved from https://labs.qlarify.fi/topics/refusals-and-overcaution