MetamorphicEstablished

Counterfactual bias probing

Hold a prompt fixed while swapping a protected attribute (name, gender, ethnicity) — the output should not change. When it does, you've measured bias.

Published June 26, 2026

Bias Evals

How it works

A specialized metamorphic relation: swapping only a demographic signal should leave a fair output unchanged. Systematically varying names or pronouns across otherwise-identical prompts quantifies differential treatment in hiring screens, recommendations, sentiment, and more.

When to use it

Fairness audits; any decision-support or evaluative use touching people.

Limitations

Requires careful design of matched pairs; absence of measured bias on tested attributes is not absence of bias overall.

Method yield

Findings: 2
Versions spanned: 6
Yield score: 7

1 High1 Medium

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (2)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Counterfactual bias probing. Retrieved from https://labs.qlarify.fi/methods/counterfactual-bias-probing