My Engineers Don't Hate Building Software. They Hate Testing It.
An offhand complaint from a CTO exposed the real bottleneck in modern software: not building, but proving what you built is safe to ship. The origin of a category.
The complaint that was actually a diagnosis
If you have run an engineering org, you know the line is true before you finish reading it. Nobody got into this work to write software. They got into it to build things. Building is the part with leverage, momentum, the satisfaction of watching an idea become a running system. Testing is the tax. It is the part where progress stops feeling like progress and starts feeling like proving a negative to an audience of skeptics, most of whom are your own future selves at 2 a.m.
So engineers do what humans do with a tax they resent. They minimize it. They write the happy-path test, skip the messy integration case, mark the flaky suite as "known flaky," and ship. Not because they are reckless. Because the validation work is structurally worse than the building work: slower, less creative, less rewarded, and never done. The codebase grows linearly. The space of things that could break grows combinatorially. The gap between what you built and what you can vouch for widens with every release.
That gap is the real bottleneck. Not velocity. Not headcount. The distance between "it's built" and "I can prove it's safe to ship."
Why the gap is now structural, not cultural
For a long time you could paper over this with discipline. A strong testing culture, a respected QA function, a release manager who said no. Those mitigations assumed something that is no longer true: that a human wrote most of the code and could, in principle, reason about it.
That assumption is gone. Roughly 41% of codebases are now AI-generated. Code arrives faster than any team can read it, let alone reason about its second-order effects. And it arrives carrying risk: industry research puts the share of AI coding tasks that introduce a critical flaw or security issue near 45%. Generation got nearly free. Validation did not get cheaper at all. If anything it got harder, because the thing you are validating was authored by a system that does not understand your threat model, your data boundaries, or how a change to one service ripples through forty others.
So the resentment my CTO friend described is no longer a culture problem you can fix with a brown-bag talk about test coverage. It is a structural mismatch. You have machine-speed authorship feeding into human-speed validation, and the queue between them grows without bound. The cost of poor software quality is estimated at around $2.41 trillion. That number is not abstract. It is the aggregate of every incident, breach, and rework cycle that flows from shipping changes nobody could fully vouch for.
The tell: 80% of developers route around the controls
Here is the part that turned a complaint into a conviction. When validation is friction without leverage, people route around it. Roughly 80% of developers bypass policy or guardrails. Read that as the verdict, not the crime.
A guardrail that gets skipped four times out of five is not a guardrail. It is theater that produces a paper trail and a false sense of safety. And the standard response to that finding is exactly wrong: add another scanner, another gate, another required check. Each one adds friction. None adds leverage. You are taxing the part of the job your engineers already hate, and acting surprised when they evade the tax more aggressively.
The engineers are not the problem. The architecture of validation is the problem. We bolted controls onto the outside of the work, made them slow and noisy, and disconnected them from any understanding of what a given change actually touches. Of course they get bypassed. They were designed, accidentally, to be bypassable.
Testing was never the thing they hated
Sit with the original complaint long enough and it inverts. Engineers don't hate testing. They hate the version of testing we built for them:
- Static scripts that rot. A test suite is a snapshot of a system that no longer exists by the next sprint. Maintaining it is unpaid, unloved labor.
- Validation that is blind to change. Re-running everything on every commit, because nothing in the stack knows what this specific change can actually reach. Brute force masquerading as rigor.
- Findings without decisions. Five dashboards, four queues, zero answers to the only question that matters: is this release safe, and can I prove why?
- Controls that generate work instead of removing it. Every tool adds a policy surface and a backlog. Confidence at release time does not move.
What engineers actually want is the opposite. They want validation that keeps pace with the system, scopes itself to what changed, and hands them a defensible answer instead of another inbox. They want the tax to become infrastructure. That reframing is the whole thesis: the missing piece is not a better model or another point tool. It is a control layer that owns the question of what is allowed to ship, and answers it with evidence. That is the case I make in detail in why AI is missing a control layer, not more models.
What a control layer does with the part they hate
If the failure mode is human-speed validation behind machine-speed authorship, the fix is to make validation itself a governed, continuous system rather than a manual chore. Concretely, that means a few capabilities most stacks have never had in one place.
- A live map of the system. You cannot validate change-aware unless you know what changed and what it touches. A System Graph of services, dependencies, and CI/CD turns "test everything, every time" into "test what this change can actually reach." That alone removes most of the brute-force tedium engineers resent.
- Validation that maintains itself. Testing Fleets are coordinated agents that plan, execute, observe, and maintain validation as the system evolves. Not static scripts that rot, but coverage that moves with the code. The unloved maintenance labor becomes the machine's job.
- Governed fixing, not unsupervised fixing. When something breaks, the layer can propose a remediation, but a human authorizes it. Remediation Fleets operate under policy, approval, and audit. Agents propose; humans authorize. Letting agents rewrite production unsupervised is not autonomy. It is an incident with a longer fuse.
- Evidence as the output. Not a green check. An audit-ready record of what was tested, what was found, what was fixed, and who approved it.
The mechanism that ties these together is a closed loop: understand the system, test against it, reproduce what fails, remediate under governance, verify the fix held. Run that loop continuously and the thing engineers hated stops being a manual tax. It becomes a property of the platform they build on.
There is a sharp, measurable version of this leverage. Reachability-based prioritization, asking whether a flagged vulnerability is actually reachable in your system, can mean 70 to 90% less exploitable exposure to triage. But reachability is only as good as your map. A scanner without system context guesses. A control layer that already maintains a live dependency graph answers reachability as a native query and carries that judgment into the release decision. The smartest validation gets smarter on shared context.
What to do Monday morning
You do not need to rip out your stack to start. You need to change what owns the part your engineers hate.
- Ask who actually decides a release is safe. If the answer is "the person on the deploy reading five dashboards," you have a control gap, not a tooling gap.
- Count the bypass. Where do engineers route around your guardrails, and why? Each instance is a control that added friction without leverage. That is the work to consolidate, not double down on.
- Make change-awareness the requirement. Any validation that cannot tell you what a specific change reaches is testing in the dark. Prioritize context over raw test volume.
- Demand evidence, not status. "Tests passed" is a status. "Here is what we tested, found, fixed, and who signed off" is evidence. Only the second survives an audit or a breach.
Consider a hypothetical fintech team merging forty AI-assisted PRs a day across tangled services. A sixth scanner gives them a sixth queue and a sixth thing to bypass. A control layer above the stack gives them one scoped answer per release, with a signed record of the call. That is the difference between taxing the work they hate and removing it.
The bottom line
Guías relacionadas
Producto relacionado
Continuar leyendo
From Microsoft Scale to a New Category: How TAS23 Became Zof
The founder arc behind Zof: running engineering at Microsoft scale, a 2023 conference talk, and the reframe from QA tooling to governed reliability infrastructure.
The Closed Loop: Why Reliability Is Five Steps, Not One Tool
A founder's case for why reliability is an operating loop, not a tool: Understand, Test, Reproduce, Remediate, Verify, built for SREs drowning in AI-speed change.
Agents Propose, Humans Authorize: The Principle Behind Governed Autonomy
Why \"agents propose, humans authorize\" is the founding design rule that separates a credible reliability control layer from reckless autonomous fixing.
