Product

The Remediation Metrics That Matter: Mean-Time-to-Governed-Fix, Revert Rate, and Recurrence

MTTR rewards fast diffs, not safer systems. Govern autonomous remediation on mean-time-to-governed-fix, revert rate, recurrence, and reachable-risk instead.

Book a demo

Zof Reliability Team · Engineering & product

January 7, 2025 · 7 min read · Updated January 7, 2025

Why MTTR went vanity the moment fixing got automated

MTTR was a defensible proxy when humans wrote the fix. A human applying a patch was an implicit quality gate. They understood the blast radius, they hesitated before touching a payment path, and the time the clock measured included the time they spent thinking. The number correlated with care.

Automated remediation breaks that correlation. An agent can close a ticket in ninety seconds, and the speed tells you nothing about whether the change was correct, whether it touched something it shouldn't have, or whether the same defect recurs next week under a different stack trace. You can drive MTTR to near zero by shipping fast, wrong fixes. The metric will applaud.

This is not a hypothetical edge case at current code volumes. Roughly 41% of codebases are now AI-generated, and around 45% of AI coding tasks introduce critical flaws or security issues. The defects are arriving faster than humans can review them, which is exactly why teams reach for autonomous fixing. But if the metric governing that fixing rewards diff throughput, you have built a machine that generates risk faster and calls it resolution. The cost of poor software quality already sits near $2.41 trillion. Optimizing for speed-to-close without governing for correctness is how you contribute to that number while reporting a green dashboard.

The fix is not to slow down. It is to measure the things that actually distinguish a governed fix from a fast one.

1. Mean-Time-to-Governed-Fix (MTTGF)

Replace MTTR with the time from defect detection to a fix that is validated, policy-checked, authorized, and verified in production. The stopwatch does not stop when a diff merges. It stops when the change has cleared the control loop: validated against its real dependencies, checked against policy, authorized by a named human where policy requires it, and confirmed to have actually resolved the underlying defect.

That is a harder number to move, and that is the point. MTTGF refuses to give you credit for a fix that hasn't been proven. It includes the validation and authorization time that vanity MTTR conveniently omits.

A skeptical reader will object that this just makes the number look worse. Correct, at first. But it makes the number *honest*, and an honest number that trends down over a quarter is a defensible ROI story. A vanity number that's already near zero has nowhere to go and proves nothing. The closed loop is what makes MTTGF measurable at all: Understand the change's real scope, Test it, Reproduce the failure deterministically, Remediate, Verify. Each stage stamps the evidence the metric depends on.

2. Revert rate (the fix that didn't hold)

Revert rate is the percentage of shipped fixes that get rolled back, hotfixed, or superseded within a defined window. It is the single most honest signal that your remediation is churning diffs rather than reducing risk, because a revert is the system telling you the fix was wrong, incomplete, or had side effects nobody caught.

Vanity MTTR hides reverts. A fix that ships in two minutes and gets reverted in two hours can still report a low MTTR for the original ticket. The revert becomes a *new* ticket with its own fast resolution, and the dashboard shows two quick wins instead of one failure. Revert rate collapses that illusion into a single number you can't game.

Watch for two failure modes:

The thrash loop. A fix, a revert, a re-fix, another revert. Each cycle scores well on MTTR and terribly on revert rate. This is the signature of unsupervised autonomous fixing with no verification step.
The silent supersede. A fix that's quietly replaced by a different change days later, never formally reverted. Track supersession alongside reverts or the thrash hides in plain sight.

A low revert rate is what earns the trust to widen autonomy. It is the metric that lets you tell a board that agents are fixing more *and* breaking less.

3. Recurrence rate

Recurrence is the percentage of defects that return after being marked resolved, whether as the identical bug or the same root cause wearing a different stack trace. It answers the question MTTR structurally cannot: did we fix the *problem*, or did we mute the *symptom*?

This is where most remediation programs leak value. A fix that suppresses an error without addressing its cause will pass tests, close the ticket, and reduce MTTR, and the defect will resurface under marginally different conditions. You pay the remediation cost repeatedly and never retire the risk. High recurrence with low MTTR is the precise fingerprint of symptom-patching at scale.

Measuring recurrence properly requires that your remediation is reproduction-grounded. If you can't deterministically reproduce a failure, you can't prove a fix addressed its cause rather than coincidentally cleared the alert. This is the Reproduce stage of the loop doing load-bearing work: a fix verified against a reproduced failure is a fix you can claim retired the defect. Reliability Analytics is where recurrence becomes visible as a trend, because recurrence only shows up over time and across incidents that a per-ticket view can't connect.

4. Reachable-risk burndown (not finding count)

Counting how many findings remediation closed is the issue-tracker version of vanity MTTR. Most findings aren't exploitable from a live entry point, so a high close count can represent a lot of motion against low-risk noise while the genuinely dangerous, reachable defects wait in the queue.

Measure burndown of reachable risk instead: the exploitable exposure that is actually reachable from a live path. Reachability-based prioritization can mean 70-90% less exploitable exposure to triage, which means the metric stops flattering busywork and starts tracking risk that matters. A System Graph is what makes this computable, because reachability is a property of the dependency map. It knows whether a vulnerable function sits on a path from a real entry point or in dead code no request ever hits. Remediation governed on reachable-risk burndown puts agent effort where the danger is, instead of wherever the close count is easiest to run up.

5. Authorization integrity (the governance metric)

If agents are proposing fixes, you must measure whether the human-authorization model is actually holding. Authorization integrity asks: what percentage of shipped fixes followed the policy that governs them? Were payment-path changes approved by a named human? Did anything reach production through a bypass?

This metric exists because the alternative is the failure mode this entire category is built to prevent. Around 80% of developers already bypass policy and guardrails, and a fast autonomous fixer is the easiest thing in your stack to wave through. Agents propose; humans authorize is only real if you can prove it after the fact. The governance posture is the engineering, not the paperwork.

Remediation Fleets propose fixes; Governance records who authorized what, under which policy, with a reproducible audit trail. For teams that can't send code to a vendor cloud, Edge Runners run the loop as signed capsules inside your boundary and produce the same audit-ready evidence. When an incident review or a regulator asks why a change shipped, authorization integrity is the difference between an answer and an apology.

What to do Monday morning

You don't need to instrument all five at once. You need to stop trusting MTTR and start trusting the metrics that distinguish a governed fix from a fast one.

Add revert rate to your remediation dashboard this week. It's the cheapest honest signal, and you almost certainly already have the data.
Define your governed-fix clock. Decide what "done" means: validated, policy-checked, authorized, verified. Measure to that line, not to merge.
Stop reporting finding counts to leadership. Replace them with reachable-risk burndown so effort tracks danger.
Write your authorization policy down. If you can't state which fixes need a named human, you can't measure whether the rule held.

Prove the metrics on one service, watch revert and recurrence fall, then widen autonomy as the evidence compounds.

The bottom line

Remediation Fleets Human Authorization System Graph Edge Runners Incident Reproduction

Related guides

System Graph for reliability

Continue Reading

Product

Inside a Testing Fleet: How Coordinated Agents Plan, Execute, Observe, and Maintain Validation

An anatomy of the testing fleet: how coordinated agents plan, execute, observe, and maintain validation as a continuous loop instead of a one-shot test run.

Zof Reliability TeamJun 23, 20267 min read

Product

The 2026 State of Autonomous Remediation: From Suggestion to Governed Fix

Autonomous remediation is the next frontier beyond test generation. Why governed fixing, not unsupervised autonomy, is the only version enterprises will adopt in 2026.

Zof Reliability TeamJun 18, 20267 min read

Product

Rollback-First Remediation: Designing Fixes You Can Always Undo

Safe autonomous fixing means every change ships with a pre-validated undo path. A platform engineer's guide to rollback-first remediation patterns and the autonomy they unlock.

Zof Reliability TeamMay 28, 20268 min read

Why MTTR went vanity the moment fixing got automated

1. Mean-Time-to-Governed-Fix (MTTGF)

2. Revert rate (the fix that didn't hold)

3. Recurrence rate

4. Reachable-risk burndown (not finding count)

5. Authorization integrity (the governance metric)

What to do Monday morning

The bottom line

Continue Reading

Inside a Testing Fleet: How Coordinated Agents Plan, Execute, Observe, and Maintain Validation

The 2026 State of Autonomous Remediation: From Suggestion to Governed Fix

Rollback-First Remediation: Designing Fixes You Can Always Undo

One surface for posture, operations, and what needs attention next.