Skip to content
エンジニアリング

A Migration Playbook: Retiring Your Selenium Suite Onto Testing Fleets

A staged playbook for platform teams retiring a brittle Selenium suite onto governed Testing Fleets without opening a coverage gap.

Zof Reliability Team · エンジニアリング & プロダクト

2026年2月3日 · 読了時間 7 分 · 2026年2月3日 更新

Share
01

Why a Selenium suite decays faster than you replace it

Three failure modes compound, and they are structural, not a matter of better-written scripts.

The first is selector brittleness. Selenium tests bind to the DOM, so they encode assumptions about markup that have nothing to do with whether the workflow works. A class rename or a wrapped div breaks a test that was validating correct behavior. Teams respond by writing more defensive selectors, which makes tests harder to read and no more meaningful.

The second is flakiness, and flakiness is corrosive in a specific way. A suite that fails 8% of the time for non-deterministic reasons trains engineers to re-run until green. Once "re-run until green" is the norm, the suite has stopped being a gate. It is theater that occupies CI minutes.

The third is maintenance drag without change-awareness. A Selenium suite does not know what changed. It runs the same assertions whether you touched the payments authorization path or a footer link, so it cannot tell you where the risk actually is. You pay full execution cost for a flat, contextless signal.

The honest read: the problem is not that your scripts are badly written. It is that scripts are the wrong unit of validation for a system that changes continuously. The replacement is not better scripts. It is coordinated agents that plan, execute, observe, and maintain validation as the system evolves, Testing Fleets rather than a static suite.

02

Phase 0: map what the suite actually protects

Before you retire anything, you need to know what it covers, not what someone documented two years ago. Most Selenium suites are an archaeological record. They contain tests for flows that were deprecated, duplicate coverage of the same path under three different names, and conspicuous gaps nobody noticed because no test ever guarded them.

This is the job of the System Graph: a live map of services, dependencies, and CI/CD topology that lets you ask which critical paths exist and which of your existing tests actually touch them. The output of Phase 0 is a coverage map overlaid on the real system, sorted by blast radius.

What to do Monday:

  • Inventory the suite by outcome, not file. Group tests by the user-facing workflow they assert, then mark each as load-bearing, duplicate, or dead.
  • Overlay coverage on the graph. Identify which high-criticality paths, auth, payments, checkout, data export, are guarded, thinly guarded, or unguarded.
  • Quantify the flake tax. Pull the pass rate and re-run frequency per test. Tests that pass only on the third attempt are not coverage; they are candidates for early retirement.

You will almost always find the suite is simultaneously over-built on low-risk flows and under-built on the paths that would actually hurt you. That asymmetry is the migration's first deliverable, and it is useful on its own.

03

Phase 1: run fleets in shadow, keep Selenium authoritative

Do not cut over. Run both. In Phase 1, Testing Fleets execute against the same environments as your Selenium suite, but their verdict is advisory. Selenium remains the gate that can block a release. The fleets are watched, not trusted, until they earn it.

This phase exists to answer one question with evidence: does fleet-based validation catch what the suite catches, plus what it misses, without flooding you with false positives? You are looking for three signals over a few weeks of real diffs.

  • Concordance. When Selenium fails, do the fleets fail for the same underlying reason? Divergence is where you learn something, usually that a Selenium failure was a selector artifact, not a real defect.
  • Net-new catches. Regressions on graph-critical paths the old suite never guarded. This is the coverage you were missing, surfaced before it ships.
  • False-positive rate. A fleet that cries wolf is as useless as a flaky script. Tune until the signal is trustworthy enough to gate on.

Because the fleets are anchored to the System Graph, their validation is change-aware: a diff that touches the checkout service pulls focused validation of checkout and its blast radius, instead of a flat re-run of everything. You are not just replacing the suite. You are replacing "run all the scripts and hope" with "validate what this change can actually break."

04

Phase 2: cut over path by path, retire scripts as you go

Migrate by criticality, highest blast radius first, and retire Selenium tests only when the fleet has demonstrably covered the same path. The retirement criterion has to be explicit, or you will either keep dead scripts forever or delete coverage you still needed.

A defensible cutover gate per path:

  1. Fleet coverage of the path is confirmed against the graph, the workflow and its dependencies are validated, not just the happy click-through.
  2. Concordance held over a representative window: the fleet caught the real regressions Selenium caught, with an acceptable false-positive rate.
  3. The fleet surfaced at least the same critical-path coverage, and ideally net-new catches.
  4. Only then is the corresponding Selenium test moved from authoritative to archived.

Sequence matters. Start with paths that are high-criticality and well-understood, where you can judge concordance confidently. Leave the long tail of low-risk, low-flake tests for last; they are cheap to keep running in shadow and carry little urgency. The principle is that coverage never drops below the line you measured in Phase 0. At every step, total effective coverage is Selenium-still-authoritative plus fleet-now-authoritative, and that sum only goes up.

Consider a hypothetical e-commerce team with 1,400 Selenium tests. Phase 0 reveals roughly 300 guard the revenue path, 600 are duplicates or dead, and the search-and-recommendation flow is barely covered at all. They migrate the 300 first, archive the 600 outright, and let the fleets build the coverage the suite never had. The suite shrinks while the coverage grows, which is the entire point.

05

Phase 3: govern the fleet you now depend on

A fleet that maintains its own validation as the system evolves is more capable than a script suite, which means governance is not optional, it is the engineering. When validation adapts itself, you need to know what changed and why, and you need humans authorizing the consequential moves.

The operating principle is agents propose, humans authorize. When the fleet adapts coverage, retiring a check that no longer maps to real behavior, or adding one for a new path, that adaptation is visible and, where it matters, approved. Governance provides the policy defining what the fleet may change autonomously, the approval step for changes that touch sensitive surfaces, and the audit trail recording who authorized what against which evidence. This matters because industry research finds roughly 80% of developers bypass guardrails that slow them down; governance that lives outside the validation path gets routed around, while governance that *is* the path holds.

For regulated and security-sensitive teams, validation also has to run inside the customer boundary. Edge Runners execute as signed capsules inside secure enclaves and produce audit-ready evidence, so the migration does not trade Selenium's local execution for a model where test data leaves your perimeter.

The payoff lands in Reliability Analytics: instead of a green build that means "the scripts that still pass, passed," you get an evidence-backed read on whether a release is actually ready, tied to the real system the System Graph maps.

06

The bottom line

関連ガイド

続きを読む

01Zof Console

姿勢、操作、次に注意が必要なことを 1 つの面で確認できます。

エンジニアリング、QA、SREの各チームが毎日開く認証済みのホーム。品質の姿勢、進行中の実行、モジュールごとのカバレッジ、そして次に注目すべきことが分かります。

運用上の KPI

実行数、カバレッジ、リスク

出荷先のあらゆる環境に対応します。

ワークスパイン

仕様・テスト・スケジュール

仕様から計画された回帰まで。

ガードレール

RBAC・SSO・監査

指定された人間に起因するすべての行為。

LIVE/console
Zof AI ホーム コマンド センターには、94% パスでの 12 件の実行、3 つの未解決の重大な問題、84% のカバレッジ、4 つのモジュール トレーサビリティ バー、仕様パイプライン、今後のスケジュール、アクティブ実行サイドバー付きの推奨される次のアクションが表示されます。
ホーム ビュー · チェックアウト サービス · ステージング · 製品からライブでキャプチャ。
  • 01 · RUNS · 24H

    94% pass

    12 runs across staging

  • 02 · COVERAGE

    84%

    Across four modules

  • 03 · ACTIVE RUNS

    3 running

    Live on this branch

  • 04 · NEXT ACTIONS

    Recommended

    Triage gaps, new spec

A Migration Playbook: Retiring Your Selenium Suite Onto Testing Fleets