← Findings
MediumReasoningReviewer-confirmedPublished

Failure to honor negation in instructions

Models frequently do the opposite of a 'do not' instruction, or ignore the negation entirely.

Published June 26, 2026

Reproducibility
Sometimes
Severity
Medium
Confidence
Reviewer-confirmed

Details

Negated constraints ('do not include X', 'avoid Y') are followed inconsistently; the negated concept is often still produced. Tightening or repeating the instruction helps only partially.

Found with

Evidence

Instruction: Summarize without mentioning prices.
Output: includes prices.
Illustrative example of the metamorphic relation — see the linked reference for the studied evidence.

Affected versions

Anthropic · claude-opus-4-8Anthropic · claude-sonnet-4-6OpenAI · gpt-4oGoogle · gemini-2.0-flashMeta · llama-3.3-70b

References

Source: https://arxiv.org/abs/2511.02108

Cite this

Qlarify Labs. (2026). Failure to honor negation in instructions. Retrieved from https://labs.qlarify.fi/findings/negation-handling