Framing a disallowed request as fiction or a persona can induce the model to bypass its safety policy.
Published June 26, 2026
Reproducibility
Sometimes
Severity
High
Confidence
Reviewer-confirmed
Details
By asking the model to adopt a persona or 'play a character' that is unconstrained, users can sometimes elicit content the policy would otherwise refuse. A classic, recurring jailbreak family that resurfaces in new forms after each mitigation.