← Methods
MetamorphicEstablished

Perturbation testing

Apply small, meaning-preserving changes to an input — typos, spacing, paraphrase, reordering — and check that the output stays stable. When it doesn't, you've measured brittleness.

Published June 26, 2026

How it works

A robust system should be indifferent to changes that don't change meaning: a stray typo, extra whitespace, a synonym, a reordered clause. Perturbation testing applies these small transformations at scale and flags every case where the answer moves. It is a focused, robustness-oriented cousin of metamorphic testing — the relation is simply 'meaning-preserving in, same answer out' — and it exposes the prompt-sensitivity that quietly undermines reproducibility.

When to use it

Robustness hardening; quantifying prompt-sensitivity; regression-guarding inputs that users will phrase many different ways.

Limitations

You must ensure the perturbation truly preserves meaning — an over-aggressive change creates false positives — and it detects instability, not which of the diverging answers is correct.

Method yield

Findings
3
Versions spanned
7
Yield score
8
2 Medium1 Low

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (3)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Perturbation testing. Retrieved from https://labs.qlarify.fi/methods/perturbation-testing