← Reference library
PaperHigh credibilityarXiv · Röttger et al. · August 1, 2023

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Our summary

A test suite of clearly-safe prompts designed to surface exaggerated safety behaviour: models refuse benign requests that merely resemble unsafe ones or mention sensitive words.

Why it matters

Over-refusal is a measurable usability failure of safety tuning, and XSTest gives a concrete way to catch it.

Cited by these methods

Related findings (1)

Published June 26, 2026

Cite this

Qlarify Labs. (2026). XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models. Retrieved from https://labs.qlarify.fi/references/xstest-exaggerated-safety-2023