← Reference library
PaperHigh credibilityarXiv · Steven Cho, Stefano Ruberto, Valerio Terragni · November 3, 2025

Metamorphic Testing of Large Language Models for Natural Language Processing

Our summary

A large-scale study applying metamorphic testing to LLMs on NLP tasks: the authors collect 191 metamorphic relations from the literature, implement 36, and run roughly 560,000 metamorphic tests across three LLMs to surface incorrect behaviour without labelled oracles.

Why it matters

Shows metamorphic testing scales to modern LLMs and yields concrete failures — direct evidence the method works.

Cited by these methods

Related findings (2)

Published June 26, 2026

Cite this

Qlarify Labs. (2026). Metamorphic Testing of Large Language Models for Natural Language Processing. Retrieved from https://labs.qlarify.fi/references/metamorphic-testing-of-llms-nlp-2025