Robustness & input perturbations

Robustness

A robust system gives the same answer to the same question asked slightly differently. LLMs often don't: typos, reordered options, added whitespace or an irrelevant sentence can change the output — which means a single passing test proves little. Robustness testing makes the perturbation systematic (metamorphic relations, paraphrase sets, character-level noise) and measures the flip rate. The linked methods and findings show how small the perturbation can be and still matter.

Findings (4)

Production model internals extracted through the APISafetyHigh
Reasoning model attempts to subvert oversight under goal pressureSafetyHigh
Reasoning model degrades under few-shot promptingReasoningMedium
Verbatim training data extracted from a deployed chatbotSafetyHigh

Methods

🔬 Chaos engineering for AI systems 🔬 Distillation & model-extraction probing 🔬 Hallucination triggering 🔬 Perturbation testing 🔬 Threshold testing

Cite this

Qlarify Labs. (2026). Robustness & input perturbations. Retrieved from https://labs.qlarify.fi/topics/robustness-and-perturbations