Qlarify Labs

AI Testing Catalog

An evidence-based catalog of AI/LLM limitations, weaknesses, and bugs — and the testing methods that find them. Methods are the hero; findings are the proof. Models change fast, so we date everything and study what keeps holding up.

28
Methods
36
Verified bugs
52
References
11
Models tracked
19
Versions
7
Providers

Most productive methods

All methods →
  1. 1Differential testing7 findings · 7 versions
  2. 2Prompt-injection & jailbreak testing4 findings · 4 versions
  3. 3Boundary & edge-case testing7 findings · 6 versions
  4. 4Property-based testing6 findings · 7 versions
  5. 5Factual oracle verification5 findings · 7 versions

Ranked by a severity-weighted yield score. Why we measure this →

Latest from the library

All references →

In the catalog