Inconsistent answers to semantically equivalent prompts
Trivial rewordings of the same question yield materially different answers.
Published June 26, 2026
- Reproducibility
- Often
- Severity
- Medium
- Confidence
- Reviewer-confirmed
Details
Paraphrasing a prompt — reordering clauses, synonym swaps, formatting changes — can change the answer, exposing brittleness and prompt-sensitivity that undermine reproducibility.
Found with
Metamorphic relation — paraphrase invariance: a reworded but semantically equivalent prompt should produce the same answer. The model returns different answers to equivalent phrasings, violating the relation. This is the canonical LLM metamorphic relation.
🔬 Perturbation testingSemantically-neutral paraphrase perturbations change the answer, exposing prompt brittleness.
Evidence
Two paraphrases of one question return different numeric answers.
Affected versions
References
Source: https://arxiv.org/abs/2511.02108
Cite this
Qlarify Labs. (2026). Inconsistent answers to semantically equivalent prompts. Retrieved from https://labs.qlarify.fi/findings/rephrasing-inconsistency