← Findings
MediumRefusalReviewer-confirmedPublished

Over-refusal of benign requests

Safety tuning causes refusal of harmless requests that merely resemble sensitive ones.

Published June 26, 2026

Reproducibility
Sometimes
Severity
Medium
Confidence
Reviewer-confirmed

Details

Models sometimes refuse legitimate requests (security education, medical information, fiction) because surface features pattern-match to disallowed content — degrading usefulness and frustrating users. The flip side of jailbreak hardening.

Found with

Evidence

A benign request for general security concepts is refused as 'potentially harmful'.
Illustrative example — see the linked reference for the documented evidence.

Affected versions

Anthropic · claude-opus-4-8Anthropic · claude-sonnet-4-6OpenAI · gpt-4oGoogle · gemini-2.0-flash

References

Source: https://arxiv.org/abs/2308.01263

Cite this

Qlarify Labs. (2026). Over-refusal of benign requests. Retrieved from https://labs.qlarify.fi/findings/over-refusal