Article · Flaky tests

Why Your Team Stopped Trusting Green (and How to Fix It)

By the Qynte team · 6 min read

A failing test tells you something. A flaky test tells you nothing — and teaches your team to ignore everything. The first time an engineer re-runs a red suite "because it's probably just that test again," your automation has started paying negative interest.

Flakiness is a measurement, not an opinion

Stop diagnosing flakiness by vibe. A test is flaky when it produces different outcomes on identical code — which means you detect it statistically, across run history. TestPlus flags tests whose pass/fail pattern can't be explained by code changes, ranked by how often they flip. That ranking is your work queue.

Quarantine fast, fix deliberately

The instant a test is confirmed flaky, move it out of the blocking suite into quarantine — it still runs, still records results, but no longer gates merges. This single policy restores trust in green within days, because green starts meaning something again. Quarantine is not a graveyard though: cap it (say, 5% of the suite) and treat breaching the cap as a stop-the-line event.

The usual suspects, in order: timing waits (sleep-based instead of condition-based), shared test data, order-dependent state, third-party calls without stubs, and animation timing. Fix categories, not individual tests.

Prevent the next generation

Most flakiness is born, not acquired. Self-healing locators remove the largest single source (UI churn); generated page objects standardise waits; isolated test data kills the rest. Teams that adopt these defaults report flake rates under 1% — at which point a red suite means a real bug, and people act on it within minutes instead of re-running it.

← All resources

See This in Practice

Bring a real requirement to a TestPlus demo and watch the workflow run end to end.

Request a Demo