Skip to content
自律的な信頼性

The Control Layer Maturity Model: From Alerts to Autonomous, Authorized Action

A four-stage maturity model for software reliability, manual checks, dashboards, gated automation, governed autonomy, so engineering leaders can self-locate and act.

Zof Reliability Team · エンジニアリング & プロダクト

2026年2月25日 · 読了時間 8 分 · 2026年2月25日 更新

Share
01

How to read the model

Each stage answers one question better than the last: can the system act on what it knows? That is the axis that matters. Detection improves quickly and then plateaus. The hard, expensive frontier is closing the distance between knowing a problem exists and resolving it under control.

A few rules before you self-locate:

  • You are at the stage of your weakest critical path, not your best one. A team with beautiful automated canaries that still resolves payment incidents by hand is operating at the manual stage where it counts.
  • Stages are cumulative. Governed autonomy does not replace dashboards; it consumes their signals and adds an action boundary on top.
  • Skipping is how programs break. Teams that jump from dashboards straight to "let the agent fix it" without a policy and audit layer are not advanced. They are exposed.
02

Stage 1: Manual checks

The system knows nothing on its own. Reliability lives in people's heads, runbooks, and pre-release checklists. A senior engineer eyeballs the diff, someone runs a smoke test by hand, and the release goes out on the strength of human attention.

This is not contemptible. Manual checks catch real problems and encode hard-won judgment. The trouble is that they do not scale and they do not survive turnover. The checklist is current until the one person who maintained it leaves. Coverage is whatever the on-call engineer remembered at 2 a.m.

The ceiling: human attention is a fixed, expensive resource, and your change volume is not fixed. With roughly 41% of codebases now AI-generated and around 45% of AI coding tasks introducing critical flaws or security issues, the rate of change that needs review is climbing faster than any team can staff against. Manual review becomes a rubber stamp the moment the queue outgrows the reviewers, which it always does.

You are here if: your release confidence depends on which humans are awake, and your "process" is a wiki page.

03

Stage 2: Dashboards and alerts

The system now observes itself. You have instrumented services, you have SLOs, you have alert routing. Mean time to detection drops sharply, and for a while it feels like a transformation. You can finally see what is happening.

Seeing is the value, and seeing is also the trap. A dashboard observes and, with thresholds and burn-rate alerts, encodes a thin slice of decision. But it does not act. It fires a page; a human interprets; a human executes the fix through a different tool entirely. The control logic still lives in someone's head, exactly as it did at stage one, just with better inputs.

The ceiling: alert fatigue, which is a control failure wearing an observability costume. When every signal routes to a person and nothing can be resolved automatically within policy, the only scaling lever is more people staring at more screens. You can watch security debt accumulate in real time and remain structurally unable to stop it. The dashboard renders the blast radius of a bad config change beautifully. Rendering is not rollback. This is the difference between a control plane and a dashboard: visibility tells you a problem exists, but it has no authority to do anything about it.

You are here if: detection is excellent, your postmortems read like reruns, and "resolution" is still a human reading a graph under pressure.

04

Stage 3: Gated automation

The system can now act, within narrow and predefined lanes. You have automated canary analysis, automatic rollback on a failed health check, CI gates that block a merge when a test fails. Action is finally part of the architecture, not just notification.

This is real progress, and most strong engineering orgs live here. But gated automation has two characteristic failure modes that no amount of additional scripting fixes.

First, the gates are static while the system is not. A test suite runs the same assertions whether you touched a payment path or a footer link. A CI gate knows whether tests passed but not whether the changed code is even reachable in production. The automation is brittle precisely because it has no live model of what actually moved.

Second, advisory gates leak. An estimated 80% of developers bypass policy or guardrails when those guardrails slow them down. A check that is a non-blocking warning or a wiki page is being routed around. So you get the cost of the gate without the protection, and a green pipeline that certifies almost nothing.

The ceiling: your automation cannot reason about change, and your policy is not actually enforced. You have automated the easy, deterministic cases and left every judgment-dependent case back at stage two.

You are here if: you have rollback and CI gates, but they are change-blind and at least one critical guardrail is advisory.

05

Stage 4: Governed autonomy

The system maps itself, validates every change against that live map, acts within explicit policy, and proves what it did. This is the stage most teams have not seen articulated as a destination, so it gets conflated with "let the AI run unsupervised." It is the opposite. The defining principle is agents propose, humans authorize. Autonomy is real and bounded; accountability is preserved.

Four properties separate this stage from gated automation:

  • A change-aware model. A live System Graph of services, dependencies, and CI/CD means every proposed action is evaluated against current reality, not a stale diagram. This is what makes validation change-aware instead of running the same suite regardless of what moved.
  • Validation as an action, not a report. Testing Fleets plan, execute, observe, and maintain validation as the system evolves. The output is a verdict the control plane can act on, not a coverage number on a chart.
  • Governed remediation. Remediation is the hardest and most consequential part of the loop, which is exactly why it must be the most governed. Remediation Fleets propose scoped fixes; the Governance layer of policy, approval, and audit decides whether and how they execute. Unsupervised autonomous fixing is reckless. The governance is the engineering.
  • Evidence as a first-class output. Every loop produces an audit-ready record of what was proposed, who authorized it, what executed, and whether verification passed.

Consider a hypothetical fintech team shipping a dependency bump on a payments service. The System Graph identifies the affected downstream paths; Testing Fleets surface a regression in idempotency handling; the condition is reproduced deterministically; a Remediation Fleet proposes a scoped fix, but because this is a payments path, policy routes it for human authorization before anything executes; post-change validation confirms the fix and attaches evidence. The human is not babysitting every step. The human holds authority at the one decision that genuinely warrants it. This is also how prioritization gets honest: reachability-based analysis can mean 70-90% less exploitable exposure, because you act on what is actually reachable in the live graph.

06

What to do Monday morning

You do not need a rip-and-replace to advance one stage. You need to find one critical path and move it forward deliberately.

  1. Locate your real stage. For your last five incidents, mark where the system stopped at "notified a human." That boundary is your true stage, regardless of your best automation.
  2. Make one advisory gate enforceable. If a check is a warning or a wiki page, it is being bypassed. Pick one and make it unavoidable.
  3. Govern one remediation instead of automating it blindly. Choose a fix class you would trust as a proposal but want to authorize. That is the shape of stage four.
  4. Demand evidence from one release. Require a single decision to produce an audit-ready record of what was checked and who approved it.

The aggregate cost of poor software quality is estimated at roughly $2.41 trillion, and a large share of it is the bill for systems that could see problems but could not act on them. Advancing a stage is how you stop paying it.

07

The bottom line

続きを読む

01Zof Console

姿勢、操作、次に注意が必要なことを 1 つの面で確認できます。

エンジニアリング、QA、SREの各チームが毎日開く認証済みのホーム。品質の姿勢、進行中の実行、モジュールごとのカバレッジ、そして次に注目すべきことが分かります。

運用上の KPI

実行数、カバレッジ、リスク

出荷先のあらゆる環境に対応します。

ワークスパイン

仕様・テスト・スケジュール

仕様から計画された回帰まで。

ガードレール

RBAC・SSO・監査

指定された人間に起因するすべての行為。

LIVE/console
Zof AI ホーム コマンド センターには、94% パスでの 12 件の実行、3 つの未解決の重大な問題、84% のカバレッジ、4 つのモジュール トレーサビリティ バー、仕様パイプライン、今後のスケジュール、アクティブ実行サイドバー付きの推奨される次のアクションが表示されます。
ホーム ビュー · チェックアウト サービス · ステージング · 製品からライブでキャプチャ。
  • 01 · RUNS · 24H

    94% pass

    12 runs across staging

  • 02 · COVERAGE

    84%

    Across four modules

  • 03 · ACTIVE RUNS

    3 running

    Live on this branch

  • 04 · NEXT ACTIONS

    Recommended

    Triage gaps, new spec

The Control Layer Maturity Model: From Alerts to Autonomous, Authorize