Speed Without Clarity Is Just Motion
Velocity metrics measure motion, not progress. A first-principles case for why deploy frequency without system-level clarity and change-aware validation is vanity.
The metric you ship is the metric you optimize
Engineering organizations are unusually obedient to their own dashboards. Put deploy frequency on the wall and teams will deploy more frequently. Put lead time next to it and they will compress lead time. This is not gaming; it is the system doing exactly what you asked. The trouble is what these metrics structurally cannot encode.
Deploy frequency counts events. It says nothing about whether each event left the system more or less reliable. Lead time measures the duration from commit to production. It is silent on whether the thing you shipped was understood before it shipped. Both are magnitudes. Both go up when you remove friction, and the fastest friction to remove is always the part where someone stops to check whether a change is safe.
So the dashboard quietly rewards the wrong behavior. Teams learn that the validation step is the tax on their headline number, and they route around it. This is not hypothetical. Industry research puts the share of developers who bypass policy or guardrails near 80%. Read that as a direct consequence of measuring speed and treating validation as overhead. When the only number that earns praise is the magnitude, the direction becomes the thing you sacrifice to improve it.
Why AI made this a structural emergency, not a culture problem
For years you could paper over a thin validation layer with the simple fact that humans write code slowly. A senior engineer reading a diff was, in effect, a low-throughput but high-context validator. The bottleneck was also a safeguard.
That assumption is now broken. Roughly 41% of codebases are now AI-generated, and the rate of production is climbing past the rate at which any team can manually reason about it. The safeguard that came bundled with slow human authorship is gone, and what replaced it is not benign. Industry research finds that around 45% of AI coding tasks introduce critical flaws or security issues. So the magnitude of change is rising sharply at the same moment the per-change risk is rising. Multiply two growing quantities and you do not get a linear problem. You get the aggregate that shows up as the estimated $2.41 trillion annual cost of poor software quality: the bill for shipping changes nobody could fully vouch for.
The naive response is to slow down, and it is wrong. Slowing down does not restore clarity; it just trades one bad number for another. The correct response is to stop measuring motion and start measuring directed velocity, which requires knowing, for every change, where it is taking the system. That is a visibility and validation problem, not a pace problem.
Clarity has two components, and most stacks have neither
If velocity is magnitude times direction, then clarity, the thing that converts your motion into progress, decomposes into two specific capabilities. Naming them precisely matters, because vague calls for "better engineering culture" are how this conversation usually dies.
- System-level visibility: knowing what a change can reach. Not a wiki diagram updated quarterly, but a live, machine-readable model of services, dependencies, and CI/CD that reflects the system as it actually is right now. Without it, "is this change safe?" is unanswerable in principle, because safety is defined relative to what the change touches. This is the function of a System Graph: it makes validation change-aware instead of running the same static suite regardless of what moved.
- Change-aware validation: proving the direction before you commit to it. Static test scripts measure motion. They run, they pass, they tell you nothing about whether this specific change degraded something three hops away. Validation that keeps pace with a fast-moving system has to plan, execute, observe, and maintain itself as the system evolves. That is the role of Testing Fleets, coordinated agents rather than scripts that rot the moment the architecture shifts.
Notice that neither of these is a velocity metric. They are the instruments that tell you whether your velocity has a sign. A change that is fast and moves the system toward a verified-safe state is progress. A change that is equally fast and moves it toward an undiscovered regression is, mathematically, negative progress wearing a green checkmark.
Motion is auditable; progress requires evidence
Here is the test I would put to any reliability program. Pull your last five releases. For each one, can the system produce a record of what the change was understood to touch, what was validated against it, what was found, what was done about it, and who authorized the result? If the honest answer is "we have the deploy log and the dashboard," you are measuring motion. You can prove that something happened. You cannot prove that it was the right thing.
Progress is the claim that a change improved or preserved the system's reliability, and a claim of that kind requires evidence, not status. "Tests passed" is a status. An audit-ready record of what was tested against the live system graph, what regressed, what was remediated, and who signed off is evidence. The first survives a standup. The second survives a board question, a breach inquiry, or a regulator. The gap between them is the gap between motion and clarity, and no amount of deploy frequency closes it.
This is also why the action half of the loop has to be governed rather than merely fast. The most reckless way to improve a velocity number would be to let agents rewrite production code unsupervised. Remediation is the hardest and most consequential part of the work, which is exactly why the engineering is in the governance: policy, approval, and audit. Agents propose; humans authorize. Speed at the cost of accountability is just motion with a bigger blast radius. Governed remediation and the governance layer around it exist to make the directed change provable.
What to do Monday morning
You do not need to abandon your velocity dashboard. You need to demote it from a goal to a diagnostic, and add the instruments that measure direction.
- Add a direction column. Next to deploy frequency and lead time, track change failure rate and time-to-restore. They are the closest standard proxies for sign, and they expose whether your speed is pointed the right way.
- Make change-awareness a requirement, not an aspiration. Any validation that cannot tell you what a specific change reaches is testing in the dark. Prioritize context over raw test count.
- Demand evidence from one workflow. Pick a single high-stakes release path and require it to emit an audit-ready record, not a status. The friction you encounter is the size of your clarity gap.
- Find where speed was bought with bypassed policy. If a guardrail is advisory, assume it is being routed around, and make one gate executable and unavoidable rather than adding a sixth that gets ignored.
The smartest validation techniques compound when they run on shared context. Reachability-based prioritization, for instance, can mean 70 to 90% less exploitable exposure, but only when it queries a live model of the system rather than guessing in isolation. Direction is cheaper to measure than you think, once visibility is a first-class part of the stack rather than an afterthought.
The bottom line
Verwandte Leitfäden
Verwandtes Produkt
Lesen Sie weiter
From Microsoft Scale to a New Category: How TAS23 Became Zof
The founder arc behind Zof: running engineering at Microsoft scale, a 2023 conference talk, and the reframe from QA tooling to governed reliability infrastructure.
The Closed Loop: Why Reliability Is Five Steps, Not One Tool
A founder's case for why reliability is an operating loop, not a tool: Understand, Test, Reproduce, Remediate, Verify, built for SREs drowning in AI-speed change.
Agents Propose, Humans Authorize: The Principle Behind Governed Autonomy
Why \"agents propose, humans authorize\" is the founding design rule that separates a credible reliability control layer from reckless autonomous fixing.
