← Reference library
PaperHigh credibilityarXiv · Röttger et al. · August 1, 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Our summary
A test suite of clearly-safe prompts designed to surface exaggerated safety behaviour: models refuse benign requests that merely resemble unsafe ones or mention sensitive words.
Why it matters
Over-refusal is a measurable usability failure of safety tuning, and XSTest gives a concrete way to catch it.
Cited by these methods
Related findings (1)
Published June 26, 2026
Cite this
Qlarify Labs. (2026). XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models. Retrieved from https://labs.qlarify.fi/references/xstest-exaggerated-safety-2023