← AI tech topics

Refusals & over-caution

Safety training has a false-positive side: models refuse legitimate requests — medical questions, security research, fiction involving conflict — because they pattern-match to something forbidden. Over-refusal is a real quality defect, in tension with jailbreak resistance: tuning toward one moves the other. It is also unevenly distributed across topics and phrasings, so it needs its own evaluation rather than being treated as the safe default. The linked findings document refusals of clearly benign requests.

Findings (1)

Methods

References

Cite this

Qlarify Labs. (2026). Refusals & over-caution. Retrieved from https://labs.qlarify.fi/topics/refusals-and-overcaution