Risk Follows Dependencies, Not Folders: Rethinking Where to Test First
Incidents travel along dependency edges, not directory trees. Why test prioritization should follow graph centrality and reachability, not folders or team boundaries.
The folder is a lie your repo tells you
A repository's directory structure is an artifact of how humans like to file things, not a model of how failure propagates at runtime. Two files in the same folder can be operationally unrelated. Two services in different repos, owned by different teams, can be a single shared connection pool away from taking each other down.
When prioritization is folder-driven, you get predictable distortions. The code that changes most often gets the most tests, regardless of whether it is load-bearing. A sleepy module with one commit a quarter gets neglected, even when half the platform depends on it. Test budget pools around churn, not consequence.
Team-driven prioritization fails the same way for a different reason. Org boundaries are political, not architectural. The most dangerous changes are frequently the ones that cross a team line, because that is exactly where shared assumptions go unstated and contracts go unverified. A change owned by Team A can detonate in a service owned by Team C, and neither team's local test suite is looking at the seam between them.
The honest unit of risk is the edge. A dependency edge is a promise: this caller assumes that callee behaves a certain way. Most incidents are a broken promise on an edge somebody forgot was load-bearing.
Centrality, blast radius, and reachability
If edges carry risk, then some nodes carry far more than others, and graph theory gives us the vocabulary to say which.
- Centrality measures how much of the system's behavior routes through a node. A shared auth service, a primary datastore, a feature-flag service, a message broker: these sit on a huge fraction of paths. A regression here is not a local bug. It is a system event.
- Blast radius is the set of nodes reachable downstream from a change. It answers "if this breaks, who breaks with it?" Two changes of identical code complexity can have wildly different blast radii depending on where they sit.
- Reachability is the inverse and the sharper lens for risk. A flaw only matters if there is a live path that exercises it. A vulnerability in a code path no production caller reaches is a finding, not an incident. One in a path that every checkout hits is a fire.
Reachability is where the leverage is. Industry research on reachability-based prioritization points to 70 to 90 percent less exploitable exposure, not because the flaws disappear, but because you stop spending equal attention on unreachable noise and start spending it on the paths that can actually be exercised. The same logic that sharpens security triage sharpens reliability triage. The graph tells you which edges are live.
Why this is now urgent, not academic
Graph-driven prioritization has always been the correct idea. What changed is that the cost of getting it wrong went vertical.
Roughly 41 percent of codebases are now AI-generated, and industry research puts the rate at which AI coding tasks introduce critical flaws or security issues near 45 percent. That combination matters for one specific reason: AI-generated code does not respect your folder conventions or your team boundaries either. It produces volume across the whole graph, and it introduces edges, new calls, new dependencies, new assumptions, faster than humans can manually track them.
Meanwhile, an estimated 80 percent of developers bypass policy and guardrails when those guardrails are advisory. So the controls that might have caught a risky edge are being routed around at the same time the edges are multiplying. The cost of poor software quality, estimated at $2.41 trillion, is in significant part the bill for changes whose true reach nobody computed before they shipped.
You cannot out-staff this with more reviewers staring at more diffs. The volume is structural. The only durable answer is to make the dependency graph a first-class input to where you validate, and to keep that graph current automatically as the system changes.
What graph-driven prioritization actually looks like
Concretely, prioritizing by graph means three shifts from how most teams operate today.
1. Rank by reachable consequence, not by churn or ownership. When a change lands, the first question the system should answer is which downstream nodes it can reach and how central those nodes are. A one-line change to a leaf utility with no live callers is low priority no matter how much it churns. A dependency bump on a service that sits on the critical path of three revenue flows is high priority even if the diff is trivial. This requires a live map of services, dependencies, and CI/CD, not a stale architecture diagram drawn 18 months ago. In Zof's architecture that map is the System Graph, and its job is to make validation change-aware: you test what the change actually reaches, not a fixed suite that runs identically regardless of what moved.
2. Validate the seams, especially the cross-team ones. The graph makes the dangerous edges visible precisely because it ignores org structure. Where a change crosses from one team's node to another's, that contract is where you concentrate validation. Static scripts rot at exactly these boundaries because no single team owns the seam. Testing Fleets plan, execute, observe, and maintain validation as the system evolves, so the seam stays covered as the contract drifts, instead of decaying into a test nobody updates.
3. Make priority a verdict, not a dashboard. Knowing the blast radius is not the same as containing it. A graph that renders risk beautifully but cannot gate a release is still just a picture. The point of computing reachable consequence is to act on it, governed by policy, with a human holding authority at the decisions that warrant it.
### A hypothetical, walked through the edge
Consider a fintech team that ships a change to a shared rate-limiting library. The folder it lives in is quiet. The team that owns it is small. Folder- and team-driven prioritization would both wave it through with a light touch.
The graph tells a different story. That library sits on the path of the payments service, the auth service, and a partner-facing API. Its centrality is high and its blast radius spans three revenue-critical flows owned by three different teams. Graph-driven prioritization flags it as a top-priority validation target, routes the heaviest scrutiny to the cross-team seams, and, because a payments path is involved, requires human authorization before any remediation executes. The principle holds throughout: agents propose, humans authorize. Autonomy is real, but it is bounded by policy and accountable to a person at the decisions that genuinely matter.
What to do Monday morning
You do not need to rebuild your platform to test this thesis. You need to start scoring risk by reach instead of by location.
- Map your last five incidents to edges, not files. For each, write the dependency edge that actually broke and the node whose centrality made it hurt. You will likely find the originating change lived in a low-attention folder.
- Identify your highest-centrality nodes. Name the five services that sit on the most paths. Confirm your test priority reflects their consequence, not their commit frequency.
- Find one cross-team seam with no owned validation. Make the contract on that edge explicitly tested. That is usually your cheapest large risk reduction.
- Ask whether your prioritization is reachability-aware. If you triage findings as a flat list, you are paying full attention to unreachable noise. Sorting by what is live in the graph is where the 70 to 90 percent exposure reduction comes from.
If you want the longer argument on why AI-era code volume forces this shift, the AI code testing imperative whitepaper makes the case, and how it works shows the change-aware loop end to end. For the connection to security triage specifically, The Security Debt Crisis covers the reachability argument in depth.
The bottom line
Guides associés
Produit associé
Continuer la lecture
Inside a Testing Fleet: How Coordinated Agents Plan, Execute, Observe, and Maintain Validation
An anatomy of the testing fleet: how coordinated agents plan, execute, observe, and maintain validation as a continuous loop instead of a one-shot test run.
The 2026 State of Autonomous Remediation: From Suggestion to Governed Fix
Autonomous remediation is the next frontier beyond test generation. Why governed fixing, not unsupervised autonomy, is the only version enterprises will adopt in 2026.
Rollback-First Remediation: Designing Fixes You Can Always Undo
Safe autonomous fixing means every change ships with a pre-validated undo path. A platform engineer's guide to rollback-first remediation patterns and the autonomy they unlock.
