← Methods
OtherEstablished

Smoke testing in CI/CD

A fast, shallow pass on every build — a handful of canonical prompts and health checks — whose only job is to fail loudly on gross breakage before anything deeper runs.

Published June 26, 2026

How it works

Before the expensive evals, a smoke suite answers one question on every deploy: is the system fundamentally alive? A few canonical prompts, a model-reachability check, a token-budget sanity check, one end-to-end happy path. It is cheap, runs in CI on every commit, and exists to catch broken APIs, expired keys, and config regressions so they never reach a human reviewer — let alone production. Its value is being the first gate, not the deepest one.

When to use it

In CI/CD on every build, and as the first gate before slower probabilistic suites are worth spending tokens on.

Limitations

Shallow by design — a green smoke run proves the system runs, not that it reasons well. Depth has to come from the evaluation and robustness methods downstream.

Method yield

Findings
1
Versions spanned
4
Yield score
3
1 Medium

Severity-weighted across the published findings below. Why we measure this →

Findings it surfaces (1)

Documented failures this method catches — the evidence it works.

References & further reading

Cite this

Qlarify Labs. (2026). Smoke testing in CI/CD. Retrieved from https://labs.qlarify.fi/methods/smoke-testing