Smoke testing in CI/CD
A fast, shallow pass on every build — a handful of canonical prompts and health checks — whose only job is to fail loudly on gross breakage before anything deeper runs.
Published June 26, 2026
How it works
Before the expensive evals, a smoke suite answers one question on every deploy: is the system fundamentally alive? A few canonical prompts, a model-reachability check, a token-budget sanity check, one end-to-end happy path. It is cheap, runs in CI on every commit, and exists to catch broken APIs, expired keys, and config regressions so they never reach a human reviewer — let alone production. Its value is being the first gate, not the deepest one.
When to use it
In CI/CD on every build, and as the first gate before slower probabilistic suites are worth spending tokens on.
Limitations
Shallow by design — a green smoke run proves the system runs, not that it reasons well. Depth has to come from the evaluation and robustness methods downstream.
Method yield
- Findings
- 1
- Versions spanned
- 4
- Yield score
- 3
Severity-weighted across the published findings below. Why we measure this →
Findings it surfaces (1)
Documented failures this method catches — the evidence it works.
References & further reading
Cite this
Qlarify Labs. (2026). Smoke testing in CI/CD. Retrieved from https://labs.qlarify.fi/methods/smoke-testing