Seguridad y gobernanza

Your SAST Scanner Wasn't Built for AI-Generated Code. Here's What Reachability Changes.

SAST scanners flood the backlog when most code is AI-generated. Learn how reachability-driven triage cuts exploitable exposure by 70-90% instead of alert volume.

Book a demo

Equipo de Fiabilidad de Zof · Ingeniería y producto

23 de septiembre de 2025 · 7 min de lectura · Actualizado 23 de septiembre de 2025

The volume problem your scanner can't see past

Traditional static analysis is a pattern matcher. It walks the code, flags constructs that match known-bad signatures, and hands you a list. That model worked acceptably when code volume was bounded by typing speed and findings arrived at a rate a security team could triage between sprints.

Industry research now puts roughly 41% of codebases as AI-generated, and suggests around 45% of AI coding tasks introduce a critical flaw or security issue. Read those together and the arithmetic turns hostile. You are merging more code, faster, with a higher base rate of defects, into a scanner that responds to every defect-shaped pattern with equal urgency. The output is not a prioritized risk list. It is a flood.

The flood has a predictable failure mode, and as an SRE you have probably already lived it. A backlog of thousands of "high" findings is operationally identical to zero findings, because no one can act on it. The team triages the first screen, declares the rest technical debt, and the scanner's signal decays into noise that everyone learns to scroll past. Meanwhile the cost of poor software quality sits near $2.41 trillion, and a real exploitable path is buried somewhere on page 40 next to four hundred theoretical ones.

The scanner is not lying. A SQL string built from untrusted input genuinely is a SQL injection pattern. The problem is that the scanner has no idea whether that code path is ever reached, whether the input is ever attacker-controlled, or whether the function ships in a binary that runs in production at all. It treats a vulnerability in dead code and a vulnerability on your auth hot path as the same line item. At AI scale, that flattening is fatal.

Why AI-generated code breaks the assumptions twice

AI output stresses volume-based scanning along two axes at once, and the second is the one that gets missed.

The obvious axis is throughput: more lines, more dependencies pulled in, more surface area, all arriving faster than review can keep pace. A generation tool will happily add a transitive dependency to solve a small problem, and that dependency carries its own vulnerability profile your scanner now has to account for.

The subtler axis is shape. Human-written code tends to cluster risk where humans are paying attention. AI-generated code distributes plausible-looking constructs everywhere, including in paths that are scaffolding, examples, or never wired into a live call graph. So the AI-era backlog is not just bigger. It is structurally noisier, with a lower ratio of reachable-and-exploitable findings to total findings. A scanner that ranks by pattern severity will rank that noise as urgent.

This is also why "just add another gate" backfires. A blanket scanner gate that blocks merges on any high-severity pattern, against AI-volume input, becomes the slowest thing in your pipeline. And research suggests around 80% of developers already bypass policy and guardrails. A gate that fights the developer at AI scale does not get respected. It gets disabled, or routed around, and now you have neither speed nor coverage.

What reachability actually means

Reachability analysis asks a different and more honest question. Not "does this code contain a dangerous pattern," but "can untrusted input actually flow to this dangerous pattern in a way that executes in production." It is the difference between a vulnerability that exists and a vulnerability that is exposed.

A finding is worth your attention when several conditions hold together:

The vulnerable code is in a code path that actually executes, not dead or unlinked code.
There is a data flow from an untrusted source to the dangerous sink, not a constant or an internal-only value.
The path is present in what you ship and deploy, not in a test fixture or an unused branch.
No existing control already neutralizes it before the sink, such as validation or an isolation boundary.

When you filter the raw scanner output through those questions, the list collapses. Most flagged patterns fail at least one condition. Zof's published data puts the effect at 70 to 90% less exploitable exposure when prioritization is driven by reachability rather than raw pattern count. That is not the scanner finding fewer bugs. It is you spending your finite triage hours on the bugs an attacker could actually use.

The catch is that reachability is not a property of a file. It is a property of the system. You cannot determine whether a sink is reachable by reading the function in isolation, because reachability depends on what calls it, what data reaches it, what is deployed, and what controls sit in between. This is exactly where a pattern-matching scanner is blind, and it is why reachability has to be grounded in a live model of the running system rather than a static rule set.

How reachability-driven triage fits the reliability loop

Treat reachability not as a feature you bolt onto a scanner, but as a stage in a governed validation loop: understand the system, test against it, reproduce what matters, remediate under authorization, verify the fix held.

It starts with understanding. A System Graph that maps your services, dependencies, and CI/CD is what makes reachability computable in the first place. The graph knows which services call which, which inputs are externally exposed, and what actually ships. A raw finding from a scanner becomes a question the graph can answer: is this sink on a live path from an untrusted edge? That single join is what converts a flat severity list into a ranked exposure list.

From there, the open question on any reachability claim is whether the path is theoretically reachable or actually exploitable. Static analysis over-approximates; it will say "reachable" when it is unsure. Testing Fleets close that gap by attempting to exercise the suspected path against the running system rather than reasoning about it on paper. A path you can drive an input through is confirmed. A path you cannot is downgraded. That confirmation step is what earns the right to interrupt an engineer, and it is the difference between a triage queue people trust and one they ignore.

Confirmed, reachable, exploitable findings are the small set worth fixing now. Here the governance principle is not optional decoration: agents propose, humans authorize. Remediation Fleets can draft the fix and the evidence, but the merge stays a human decision under policy and audit. Unsupervised autonomous patching of security-sensitive code is reckless precisely because the blast radius is high; the engineering is in the governed approval, not the raw automation. For teams in regulated or sensitive environments, this can run inside your own boundary via Edge Runners, so neither your code nor your dependency graph leaves your control to be analyzed.

What to do Monday morning

You do not need to replace your scanner to start triaging by reachability. You need to stop treating its raw output as a priority list.

Measure your real ratio. Take last quarter's scanner findings and estimate how many were ever confirmed reachable and exploitable. The gap between total and reachable is your noise tax, and it is probably enormous.
Re-rank by exposure, not severity. Before anyone touches a finding, ask whether untrusted input can reach it on a shipped path. Park everything that fails that test. Do not delete it; deprioritize it honestly.
Stop gating on volume. Replace any blanket "block on high-severity pattern" gate with one that blocks only on confirmed-reachable findings. A proportionate gate is one developers stop bypassing.
Make the authorization boundary explicit. Write down which classes of confirmed finding auto-route to a fix proposal and which require a named human to authorize the merge. Ambiguity here is what produces both bottlenecks and skipped reviews.

The bottom line

Gobernanza de IA IA empresarial System Graph Flotas de pruebas Flotas de remediación

Guías relacionadas

Governed AI remediation

Producto relacionado

Continuar leyendo

Seguridad y gobernanza

Agents Propose, Humans Authorize: A Reference Architecture for Governed Autonomy

A reference architecture for letting agents act on production safely: the four control surfaces, policy, approval, evidence, attribution, and how they wire into the loop.

Equipo de Fiabilidad de Zof16 jun 20268 min de lectura

Seguridad y gobernanza

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Better code generation can't validate its own output. Why AI-written code needs a governed control layer that maps, tests, and proves every change.

Equipo de Fiabilidad de Zof14 may 20267 min de lectura

Seguridad y gobernanza

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

When 41% of your codebase has no author, the real risk isn't bugs, it's lost intent. How a System Graph restores the provenance AI-generated code strips away.

Equipo de Fiabilidad de Zof5 may 20267 min de lectura

The volume problem your scanner can't see past

Why AI-generated code breaks the assumptions twice

What reachability actually means

How reachability-driven triage fits the reliability loop

What to do Monday morning

The bottom line

Continuar leyendo

Agents Propose, Humans Authorize: A Reference Architecture for Governed Autonomy

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

Una superficie para la postura, las operaciones y lo que necesita atención a continuación.