Logic & consistency testing
Check that the model's outputs obey the rules of logic — valid inference, transitivity, symmetry, no self-contradiction — across related questions.
Published June 26, 2026
How it works
Fluent prose can hide broken reasoning. Logic and consistency testing asserts the properties any sound reasoner must satisfy: if A>B and B>C then A>C, if 'A is B' then 'B is A' should be answerable, a negated instruction should invert the result, and claims made early in a session shouldn't be contradicted later. Posing structured sets of related questions and checking these invariants exposes reasoning failures that no single answer, taken alone, would reveal.
When to use it
Evaluating reasoning quality; multi-step or multi-turn tasks where internal consistency matters; catching relational and negation failures.
Limitations
Captures logical form, not real-world correctness — a model can be perfectly consistent and consistently wrong — and enumerating the relevant invariants for an open-ended task is hard.
Method yield
- Findings
- 3
- Versions spanned
- 6
- Yield score
- 8
Severity-weighted across the published findings below. Why we measure this →
Findings it surfaces (3)
Documented failures this method catches — the evidence it works.
- The reversal curse: 'A is B' not generalizing to 'B is A'Medium
A model trained that 'A is B' frequently fails to answer 'B is ?', revealing that learned relations are not symmetric.
How it found it: Testing whether 'A is B' implies 'B is A' exposes the missing symmetry in stored relations.
Reasoning - Failure to honor negation in instructionsMedium
Models frequently do the opposite of a 'do not' instruction, or ignore the negation entirely.
How it found it: Checking that a negated instruction actually inverts the output exposes ignored negations.
Reasoning - Self-contradiction within a single conversationLow
Models assert one fact and later assert its opposite within the same session.
How it found it: Cross-checking claims made across turns surfaces mutually exclusive assertions the model never flags.
Reasoning
References & further reading
Cite this
Qlarify Labs. (2026). Logic & consistency testing. Retrieved from https://labs.qlarify.fi/methods/logic-consistency-testing