← Findings
MediumReasoningReviewer-confirmedPublished

Inconsistent answers to semantically equivalent prompts

Trivial rewordings of the same question yield materially different answers.

Published June 26, 2026

Reproducibility
Often
Severity
Medium
Confidence
Reviewer-confirmed

Details

Paraphrasing a prompt — reordering clauses, synonym swaps, formatting changes — can change the answer, exposing brittleness and prompt-sensitivity that undermine reproducibility.

Found with

Evidence

Two paraphrases of one question return different numeric answers.
Illustrative example of the metamorphic relation — see the linked reference for the studied evidence.

Affected versions

Anthropic · claude-opus-4-8Anthropic · claude-sonnet-4-6OpenAI · gpt-4oGoogle · gemini-2.0-flashMeta · llama-3.3-70b

References

Source: https://arxiv.org/abs/2511.02108

Cite this

Qlarify Labs. (2026). Inconsistent answers to semantically equivalent prompts. Retrieved from https://labs.qlarify.fi/findings/rephrasing-inconsistency