OtherEstablished

Smoke testing in CI/CD

A fast, shallow pass on every build — a handful of canonical prompts and health checks — whose only job is to fail loudly on gross breakage before anything deeper runs.

Published June 26, 2026

Evals Reliability

How it works

Before the expensive evals, a smoke suite answers one question on every deploy: is the system fundamentally alive? A few canonical prompts, a model-reachability check, a token-budget sanity check, one end-to-end happy path. It is cheap, runs in CI on every commit, and exists to catch broken APIs, expired keys, and config regressions so they never reach a human reviewer — let alone production. Its value is being the first gate, not the deepest one.

When to use it

In CI/CD on every build, and as the first gate before slower probabilistic suites are worth spending tokens on.

Limitations

Shallow by design — a green smoke run proves the system runs, not that it reasons well. Depth has to come from the evaluation and robustness methods downstream.

Method yield

Findings: 1
Versions spanned: 4
Yield score: 3

1 Medium

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (1)

Documented failures this method catches — the evidence it works.

Format-constraint violations under strict schemasMedium
Asked for strictly-formatted output (e.g. JSON to a schema), models emit invalid or extra content.
How it found it: A daily CI smoke run on a canonical structured-output prompt catches the deploy where a model update starts breaking the schema.
Tool use

References & further reading

Continuous Integration
Martin Fowler · martinfowler.com · January 18, 2024

Cite this

Qlarify Labs. (2026). Smoke testing in CI/CD. Retrieved from https://labs.qlarify.fi/methods/smoke-testing