Engineering

Testing Fleets, Not Test Scripts

Governed agentic validation that plans, executes, observes, and maintains checks as your system changes.

Zof Reliability Team · 2026年5月3日 · 24 min read · Updated 2026年5月19日

Why scripts became the bottleneck

Script libraries grow until no one knows which checks still matter. Flaky tests train teams to ignore failures. Every UI restyle and API version bump creates maintenance work unrelated to risk reduction.

The bottleneck is not test authoring, it is test operations: deciding what to run, keeping selectors and flows current, and interpreting results in context of the change that triggered the run.

What a testing fleet is

A testing fleet is a set of governed agents coordinated to perform validation work as a system, not a bag of disconnected scripts. The fleet plans work from System Graph context, executes across surfaces, observes outcomes, and maintains assets over time.

Fleets are policy-bound: which environments they may touch, what data they may use, how long they may run, and what evidence they must produce.

Testing fleet workflow

Plan (impact + risk) → Execute (UI/API/integration/…)
        → Observe (telemetry + artifacts)
        → Maintain (update flows, retire noise)

Agent roles inside a fleet

Core roles

  • Planner: selects targets from change impact and risk score
  • Executor: runs checks with environment and data policy
  • Observer: captures artifacts, traces, and failure signatures
  • Maintainer: updates or retires checks when the graph changes

UI, API, integration, desktop, accessibility, security, and release testing

Enterprise applications are multi-surface. A fleet coordinates UI flows, contract tests, integration paths, desktop clients, accessibility rules, security smoke checks, and release-readiness gates without treating each surface as an island.

Release readiness is a fleet outcome: evidence that critical workflows behave as expected for this change, not a green checkbox on an unrelated suite.

How fleets use System Graph context

The graph answers: what changed, what depends on it, which workflows are critical, and which incidents historically touched this area. Fleets use those answers to scope work.

Instead of "run 4,000 tests," the fleet runs the 40 that matter for this merge, and documents why each ran.

How fleets reduce maintenance burden

Maintainers update flows when the graph detects structural change, new API routes, renamed screens, altered workflows. Noise is retired when checks no longer map to risk.

Humans set maintenance policy; agents perform the repetitive updates and flag ambiguous cases for review.

Evidence and telemetry

Enterprise buyers need proof, not logs buried in CI. Fleets attach artifacts, traces, screenshots, request captures, and structured failure signatures to the change that triggered the run.

Telemetry feeds reliability analytics: flaky-rate trends, mean time to reproduce, and release delay attributable to validation.

How QA teams should adopt testing fleets

Start with one critical workflow and define release-ready evidence. Pair fleet policies with existing CI gates. Expand surface coverage as confidence grows.

QA owns outcomes and policies; fleets own operational execution. This is a role evolution, not a headcount replacement narrative.

Practical migration path

90-day migration

  1. Inventory top workflows and current regression pain
  2. Model workflows in the System Graph
  3. Pilot a fleet on one service or product line
  4. Compare escaped defects and maintenance hours for 6-8 weeks
  5. Expand policies and surfaces with governance review

Final takeaway

Testing fleets treat validation as an operated system. Scripts remain useful as assets fleets maintain, not as the architecture entire enterprises depend on.

続きを読む

01操作面

姿勢、操作、次に注意が必要なことを 1 つの面で確認できます。

Zof ホームはマーケティング ダッシュボードではありません。それは、運用面のエンジニアリング、QA、および SRE チームが毎日使用する、品質の姿勢、飛行中の実行、モジュールごとのカバレッジ、およびリーダーが次に注目すべきアクションです。

運用上の KPI

実行数、カバレッジ、リスク

出荷先のあらゆる環境に対応します。

ワークスパイン

仕様・テスト・スケジュール

仕様から計画された回帰まで。

ガードレール

RBAC・SSO・監査

指定された人間に起因するすべての行為。

LIVE/console
Zof AI ホーム コマンド センターには、94% パスでの 12 件の実行、3 つの未解決の重大な問題、84% のカバレッジ、4 つのモジュール トレーサビリティ バー、仕様パイプライン、今後のスケジュール、アクティブ実行サイドバー付きの推奨される次のアクションが表示されます。
ホーム ビュー · チェックアウト サービス · ステージング · 製品からライブでキャプチャ。
  • 01 · RUNS · 24H

    94% pass

    12 runs across staging

  • 02 · COVERAGE

    84%

    Across four modules

  • 03 · ACTIVE RUNS

    3 running

    Live on this branch

  • 04 · NEXT ACTIONS

    Recommended

    Triage gaps, new spec

Testing Fleets, Not Test Scripts | Zof AI Blog