Skip to content
プロダクト

Change Impact Analysis: How One Commit Becomes a Targeted Test Plan

How a single commit becomes a targeted test plan: tracing change impact through the system graph to downstream consumers, suggested tests, and known failure zones.

Zof Reliability Team · エンジニアリング & プロダクト

2025年12月10日 · 読了時間 8 分 · 2025年12月10日 更新

Share
01

What change impact analysis actually computes

Change impact analysis (CIA) answers a specific question: given this diff, what is the set of behaviors that could plausibly change, and what is the minimum test set that proves they did not? It is not "run tests related to the changed file." File-level heuristics are too coarse in a distributed system, where the consequential blast radius is rarely contained in the repo that changed.

A useful CIA produces three artifacts, in order:

  • A reachability set: the downstream consumers, contracts, and CI/CD paths a change can actually reach. Reachability is the difference between "this function changed" and "this function is on a path that serves checkout."
  • A targeted test plan: the specific suites, integration paths, and contract checks that exercise that reachability set, ranked by relevance, not run alphabetically.
  • A historical risk overlay: which of those surfaces have failed before, so the plan weights toward known fragile zones instead of treating every path as equally trustworthy.

The first two are necessary. The third is what separates a credible plan from a clever guess, and most homegrown CIA stops before it gets there.

02

Why a graph, not a folder, is the unit of analysis

The reason regression-everything persists is that the alternative requires a model most teams do not have: a live, accurate map of how the system is wired right now. Static diagrams rot the moment they are drawn. A dependency file lists declared dependencies, not the runtime call paths that actually carry load. Without a current model, "targeted" testing is a euphemism for "tested the parts we remembered."

This is the job of a System Graph: a live dependency and context map of services, dependencies, and CI/CD that makes validation change-aware. The graph is what lets a commit become a query. You give it a diff; it returns the subgraph that diff touches, including the consumers two and three hops away that no developer holds in their head.

Consider, hypothetically, a fintech SaaS team that bumps a serialization library used by an internal accounts service. At the file level, the change is trivial and the owning team's tests are green. In the graph, that library sits on the response path for a balances API consumed by three other services, one of which feeds a regulated statement-generation job. The graph surfaces that path. Folder-based test selection never would, because the risk lives entirely in services the committing developer does not own and never opened.

This is also why reachability matters beyond test selection. Prioritizing work by what is actually reachable in the live graph, rather than triaging a flat list of findings, is what makes published reachability-based prioritization claim 70-90% less exploitable exposure: you stop spending effort on code paths nothing actually calls.

03

From commit to test plan, step by step

Here is the loop that turns one commit into a governed, targeted plan. It maps directly onto Zof's operating model, Understand, Test, Reproduce, Remediate, Verify, but the first two stages are where CIA lives.

1. Understand the change in context. The graph resolves the diff into a reachability set: changed symbols, the services that call them, the contracts crossed, and the CI/CD stages involved. The output is a scoped subgraph, not a file list.

2. Derive the targeted plan. Testing Fleets plan, execute, observe, and maintain validation as the system evolves, rather than running static scripts. Given the subgraph, they assemble the relevant suites: the unit tests on the changed code, the integration paths that cross the affected contracts, and the consumer-side checks for downstream services. Crucially, this is a plan the fleet maintains as the system changes, so it does not silently drift out of date the way a hand-curated tag-based selection does.

3. Overlay historical failure zones. Reliability Analytics weights the plan toward surfaces with a failure history: paths that flaked, contracts that broke on prior changes, services with a record of regressions under load. A path that has failed three times in six months earns deeper validation than one that has never moved.

4. Execute and verify with evidence. The plan runs, and the result is a verdict the control plane can act on: pass, fail, or a reproduced regression with a deterministic case attached. That evidence is the audit-ready record of what was checked and why it was sufficient.

The shift is from "what tests do we have?" to "what does this change require us to prove?" The plan is derived from the change, not the other way around.

04

The coverage trap, and how targeting avoids it

The instinctive objection from any test lead is the right one: targeting sounds like a polite word for skipping tests, and skipped tests are how regressions ship. That objection is correct about naive targeting and wrong about graph-driven targeting, and the distinction is worth being precise about.

Naive targeting reduces scope by guessing, by file proximity, by author, by a stale ownership map. It loses coverage because its model of impact is wrong. Graph-driven targeting reduces scope by *proving* a path is unreachable from the change. You are not declining to test a surface you suspect is affected; you are declining to re-test a surface the live graph shows the change cannot reach. Those are different risk postures entirely.

Two guardrails keep targeting honest:

  • Conservatism on uncertainty. When the graph cannot confidently establish reachability, a dynamic dispatch, a reflection boundary, a newly added edge, the plan must widen, not narrow. Targeting that fails open toward more testing is safe; targeting that fails open toward less is the trap.
  • Coverage as a graph property, not a percentage. The honest coverage question is not "what percent of lines ran" but "did we validate every reachable consumer of this change?" The graph can answer the second question. A line-coverage number cannot.

This matters more every quarter. With roughly 41% of codebases now AI-generated and industry research putting the rate at which AI coding tasks introduce critical flaws near 45%, the volume of change is rising faster than any team can full-regression its way through. Targeting is no longer an optimization. It is the only way to keep proof ahead of velocity.

05

Where governance and humans stay in the loop

Targeting decides what to test. It should not unilaterally decide what is safe to ship. The principle is consistent across the loop: agents propose, humans authorize. The Testing Fleet proposes a plan and produces a verdict; Governance is where policy decides which verdicts can auto-proceed and which require a person.

A sensible policy is graph-shaped. A change whose reachability set stays inside one low-risk service with a clean history can clear on a green targeted plan. A change that reaches a regulated path, a payments surface, or a service with a poor failure record routes to human authorization regardless of how green the plan looks. Reliability becomes the default for the routine case, and human judgment is reserved for the genuine risk, not spent rubber-stamping every merge. That is also the answer to the policy-bypass problem: when about 80% of developers route around advisory guardrails, the fix is gates that are derived from real impact and are unavoidable, not wiki pages that are easy to skip.

06

What to do Monday morning

You can pressure-test this without changing your CI:

  1. Pick your last five cross-service incidents. For each, check whether the regression lived in a service the committing developer owned. The ones that did not are exactly what file-based selection misses and what a graph would have caught.
  2. Audit one "safe" change that broke something. Trace the real path from the diff to the failure. If the path crossed two or more services, your current selection model is blind to your actual risk.
  3. Find your most-skipped slow suite and ask why it runs at all. If it never catches anything on most changes, it is a candidate for graph-driven scoping, run it when the change reaches it, not every time.
07

The bottom line

続きを読む

01Zof Console

姿勢、操作、次に注意が必要なことを 1 つの面で確認できます。

エンジニアリング、QA、SREの各チームが毎日開く認証済みのホーム。品質の姿勢、進行中の実行、モジュールごとのカバレッジ、そして次に注目すべきことが分かります。

運用上の KPI

実行数、カバレッジ、リスク

出荷先のあらゆる環境に対応します。

ワークスパイン

仕様・テスト・スケジュール

仕様から計画された回帰まで。

ガードレール

RBAC・SSO・監査

指定された人間に起因するすべての行為。

LIVE/console
Zof AI ホーム コマンド センターには、94% パスでの 12 件の実行、3 つの未解決の重大な問題、84% のカバレッジ、4 つのモジュール トレーサビリティ バー、仕様パイプライン、今後のスケジュール、アクティブ実行サイドバー付きの推奨される次のアクションが表示されます。
ホーム ビュー · チェックアウト サービス · ステージング · 製品からライブでキャプチャ。
  • 01 · RUNS · 24H

    94% pass

    12 runs across staging

  • 02 · COVERAGE

    84%

    Across four modules

  • 03 · ACTIVE RUNS

    3 running

    Live on this branch

  • 04 · NEXT ACTIONS

    Recommended

    Triage gaps, new spec

Change Impact Analysis: How One Commit Becomes a Targeted Test Plan