AdversarialEstablished

Hallucination triggering

Deliberately steer the model toward fabrication — asking about non-existent entities or beyond its knowledge — to map where it invents instead of declining.

Published June 26, 2026

Hallucination Robustness

How it works

Rather than waiting for a hallucination to appear, this method goes looking for one: ask for a summary of a book that doesn't exist, citations on a niche claim, details of a post-cutoff event, or facts the model has no basis for. The failure being probed is the model's default to a fluent answer over an honest 'I don't know'. Systematically triggering it maps the conditions under which a system fabricates — the prerequisite for deciding where it must be grounded or gated.

When to use it

Assessing factual reliability before deployment; finding the topic and prompt shapes that most reliably induce fabrication; stress-testing retrieval grounding.

Limitations

Demonstrates that fabrication can be induced, not how often it happens in normal use. Designing prompts that are genuinely unanswerable (and not just obscure) takes care.

Method yield

Findings: 2
Versions spanned: 6
Yield score: 6

1 High1 Low

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (2)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Hallucination triggering. Retrieved from https://labs.qlarify.fi/methods/hallucination-elicitation