PaperHigh credibilityFindings of ACL 2022 · Parrish et al. · May 1, 2022

BBQ: A Hand-Built Bias Benchmark for Question Answering

Our summary

A hand-built benchmark probing social bias in QA across nine dimensions: models fall back on stereotypes when context is under-specified, and are more accurate when the correct answer happens to match a stereotype.

Why it matters

A rigorous, counterfactual approach to bias probing — vary only the protected attribute and watch the answer change.

Cited by these methods

🔬 Counterfactual bias probing

Related findings (1)

Name-based demographic bias in outputsHigh
Swapping only a name (signalling gender or ethnicity) changes evaluative outputs like screening or sentiment.

Published June 26, 2026

Cite this

Qlarify Labs. (2026). BBQ: A Hand-Built Bias Benchmark for Question Answering. Retrieved from https://labs.qlarify.fi/references/bbq-bias-benchmark-2022