Product

Risk Follows Dependencies, Not Folders: Rethinking Where to Test First

Incidents travel along dependency edges, not directory trees. Why test prioritization should follow graph centrality and reachability, not folders or team boundaries.

Book a demo

Zof Reliability Team · Engineering & product

July 15, 2025 · 7 min read · Updated July 15, 2025

Summary

Most test prioritization is organized around the wrong map. Teams decide where to test first based on which folder changed, which team owns it, or which suite is fastest to run. But incidents do not respect directory trees or org charts. They travel along dependency edges, and they concentrate where the graph is most central. If you are an SRE trying to spend finite validation budget well, the question is not "what changed?" It is "what does the change reach, and what reaches it?"

A repository's directory structure is an artifact of how humans like to file things, not a model of how failure propagates at runtime.
If edges carry risk, then some nodes carry far more than others, and graph theory gives us the vocabulary to say which.
Graph-driven prioritization has always been the correct idea.

The folder is a lie your repo tells you

A repository's directory structure is an artifact of how humans like to file things, not a model of how failure propagates at runtime. Two files in the same folder can be operationally unrelated. Two services in different repos, owned by different teams, can be a single shared connection pool away from taking each other down.

When prioritization is folder-driven, you get predictable distortions. The code that changes most often gets the most tests, regardless of whether it is load-bearing. A sleepy module with one commit a quarter gets neglected, even when half the platform depends on it. Test budget pools around churn, not consequence.

Team-driven prioritization fails the same way for a different reason. Org boundaries are political, not architectural. The most dangerous changes are frequently the ones that cross a team line, because that is exactly where shared assumptions go unstated and contracts go unverified. A change owned by Team A can detonate in a service owned by Team C, and neither team's local test suite is looking at the seam between them.

The honest unit of risk is the edge. A dependency edge is a promise: this caller assumes that callee behaves a certain way. Most incidents are a broken promise on an edge somebody forgot was load-bearing.

Centrality, blast radius, and reachability

If edges carry risk, then some nodes carry far more than others, and graph theory gives us the vocabulary to say which.

Centrality measures how much of the system's behavior routes through a node. A shared auth service, a primary datastore, a feature-flag service, a message broker: these sit on a huge fraction of paths. A regression here is not a local bug. It is a system event.
Blast radius is the set of nodes reachable downstream from a change. It answers "if this breaks, who breaks with it?" Two changes of identical code complexity can have wildly different blast radii depending on where they sit.
Reachability is the inverse and the sharper lens for risk. A flaw only matters if there is a live path that exercises it. A vulnerability in a code path no production caller reaches is a finding, not an incident. One in a path that every checkout hits is a fire.

Reachability is where the leverage is. Industry research on reachability-based prioritization points to 70 to 90 percent less exploitable exposure, not because the flaws disappear, but because you stop spending equal attention on unreachable noise and start spending it on the paths that can actually be exercised. The same logic that sharpens security triage sharpens reliability triage. The graph tells you which edges are live.

Why this is now urgent, not academic

Graph-driven prioritization has always been the correct idea. What changed is that the cost of getting it wrong went vertical.

Roughly 41 percent of codebases are now AI-generated, and industry research puts the rate at which AI coding tasks introduce critical flaws or security issues near 45 percent. That combination matters for one specific reason: AI-generated code does not respect your folder conventions or your team boundaries either. It produces volume across the whole graph, and it introduces edges, new calls, new dependencies, new assumptions, faster than humans can manually track them.

Meanwhile, an estimated 80 percent of developers bypass policy and guardrails when those guardrails are advisory. So the controls that might have caught a risky edge are being routed around at the same time the edges are multiplying. The cost of poor software quality, estimated at $2.41 trillion, is in significant part the bill for changes whose true reach nobody computed before they shipped.

You cannot out-staff this with more reviewers staring at more diffs. The volume is structural. The only durable answer is to make the dependency graph a first-class input to where you validate, and to keep that graph current automatically as the system changes.

What graph-driven prioritization actually looks like

Concretely, prioritizing by graph means three shifts from how most teams operate today.

1. Rank by reachable consequence, not by churn or ownership. When a change lands, the first question the system should answer is which downstream nodes it can reach and how central those nodes are. A one-line change to a leaf utility with no live callers is low priority no matter how much it churns. A dependency bump on a service that sits on the critical path of three revenue flows is high priority even if the diff is trivial. This requires a live map of services, dependencies, and CI/CD, not a stale architecture diagram drawn 18 months ago. In Zof's architecture that map is the System Graph, and its job is to make validation change-aware: you test what the change actually reaches, not a fixed suite that runs identically regardless of what moved.

2. Validate the seams, especially the cross-team ones. The graph makes the dangerous edges visible precisely because it ignores org structure. Where a change crosses from one team's node to another's, that contract is where you concentrate validation. Static scripts rot at exactly these boundaries because no single team owns the seam. Testing Fleets plan, execute, observe, and maintain validation as the system evolves, so the seam stays covered as the contract drifts, instead of decaying into a test nobody updates.

3. Make priority a verdict, not a dashboard. Knowing the blast radius is not the same as containing it. A graph that renders risk beautifully but cannot gate a release is still just a picture. The point of computing reachable consequence is to act on it, governed by policy, with a human holding authority at the decisions that warrant it.

### A hypothetical, walked through the edge

Consider a fintech team that ships a change to a shared rate-limiting library. The folder it lives in is quiet. The team that owns it is small. Folder- and team-driven prioritization would both wave it through with a light touch.

The graph tells a different story. That library sits on the path of the payments service, the auth service, and a partner-facing API. Its centrality is high and its blast radius spans three revenue-critical flows owned by three different teams. Graph-driven prioritization flags it as a top-priority validation target, routes the heaviest scrutiny to the cross-team seams, and, because a payments path is involved, requires human authorization before any remediation executes. The principle holds throughout: agents propose, humans authorize. Autonomy is real, but it is bounded by policy and accountable to a person at the decisions that genuinely matter.

What to do Monday morning

You do not need to rebuild your platform to test this thesis. You need to start scoring risk by reach instead of by location.

Map your last five incidents to edges, not files. For each, write the dependency edge that actually broke and the node whose centrality made it hurt. You will likely find the originating change lived in a low-attention folder.
Identify your highest-centrality nodes. Name the five services that sit on the most paths. Confirm your test priority reflects their consequence, not their commit frequency.
Find one cross-team seam with no owned validation. Make the contract on that edge explicitly tested. That is usually your cheapest large risk reduction.
Ask whether your prioritization is reachability-aware. If you triage findings as a flat list, you are paying full attention to unreachable noise. Sorting by what is live in the graph is where the 70 to 90 percent exposure reduction comes from.

If you want the longer argument on why AI-era code volume forces this shift, the AI code testing imperative whitepaper makes the case, and how it works shows the change-aware loop end to end. For the connection to security triage specifically, The Security Debt Crisis covers the reachability argument in depth.

The bottom line

System Graph CI/CD Testing Fleets Remediation Fleets SRE

Related guides

System Graph for reliability

Continue Reading

Product

Inside a Testing Fleet: How Coordinated Agents Plan, Execute, Observe, and Maintain Validation

An anatomy of the testing fleet: how coordinated agents plan, execute, observe, and maintain validation as a continuous loop instead of a one-shot test run.

Zof Reliability TeamJun 23, 20267 min read

Product

The 2026 State of Autonomous Remediation: From Suggestion to Governed Fix

Autonomous remediation is the next frontier beyond test generation. Why governed fixing, not unsupervised autonomy, is the only version enterprises will adopt in 2026.

Zof Reliability TeamJun 18, 20267 min read

Product

Rollback-First Remediation: Designing Fixes You Can Always Undo

Safe autonomous fixing means every change ships with a pre-validated undo path. A platform engineer's guide to rollback-first remediation patterns and the autonomy they unlock.

Zof Reliability TeamMay 28, 20268 min read

The folder is a lie your repo tells you

Centrality, blast radius, and reachability

Why this is now urgent, not academic

What graph-driven prioritization actually looks like

What to do Monday morning

The bottom line

Continue Reading

Inside a Testing Fleet: How Coordinated Agents Plan, Execute, Observe, and Maintain Validation

The 2026 State of Autonomous Remediation: From Suggestion to Governed Fix

Rollback-First Remediation: Designing Fixes You Can Always Undo

One surface for posture, operations, and what needs attention next.