← Reference library
PaperHigh credibilityarXiv · Steven Cho, Stefano Ruberto, Valerio Terragni · November 3, 2025
Metamorphic Testing of Large Language Models for Natural Language Processing
Our summary
A large-scale study applying metamorphic testing to LLMs on NLP tasks: the authors collect 191 metamorphic relations from the literature, implement 36, and run roughly 560,000 metamorphic tests across three LLMs to surface incorrect behaviour without labelled oracles.
Why it matters
Shows metamorphic testing scales to modern LLMs and yields concrete failures — direct evidence the method works.
Cited by these methods
Related findings (2)
Published June 26, 2026
Cite this
Qlarify Labs. (2026). Metamorphic Testing of Large Language Models for Natural Language Processing. Retrieved from https://labs.qlarify.fi/references/metamorphic-testing-of-llms-nlp-2025