BoundaryEstablished

Boundary & edge-case testing

Push inputs to limits — very long contexts, token boundaries, empty/extreme values — where behavior tends to degrade sharply.

Published June 26, 2026

Evals Context window

How it works

Failures cluster at boundaries: the edge of the context window, unusually long lists, zero/one/many counts, maximum output length. Systematically walking inputs toward these limits exposes degradation that mid-range testing misses.

When to use it

Whenever input size, length, or count varies in production; long-context and structured-output features especially.

Limitations

Boundaries shift between model versions; tests need re-validation after upgrades.

Method yield

Findings: 7
Versions spanned: 6
Yield score: 17

3 Medium4 Low

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (7)

Documented failures this method catches — the evidence it works.

References & further reading

Lost in the Middle: How Language Models Use Long Contexts
Liu et al. · arXiv · July 6, 2023

Cite this

Qlarify Labs. (2026). Boundary & edge-case testing. Retrieved from https://labs.qlarify.fi/methods/boundary-testing