← AI tech topics
Jailbreaking
A jailbreak is a prompt that talks a model out of its safety training — role-play framings, encodings, many-shot patterns, or automated search over adversarial suffixes. Unlike prompt injection (which hijacks an application's instructions), jailbreaking targets the model's own refusal behavior. Defenses improve every generation and none have held: each new model ships with new bypasses found within days. The linked findings document techniques that worked and how vendors responded.
Findings (2)
References
Cite this
Qlarify Labs. (2026). Jailbreaking. Retrieved from https://labs.qlarify.fi/topics/jailbreaking