Skip to content
وكلاء الذكاء الاصطناعي

Agents Propose, Humans Authorize: The Operating Model for AI in Production

A concrete operating model for AI in production: policy, approval, and audit. The governed middle between 'no humans' hype and ungoverned autonomy.

فريق الموثوقية في Zof · الهندسة والمنتج

13 يناير 2026 · قراءة 7 دقيقة · تم التحديث 13 يناير 2026

Share
01

The two failure modes you're choosing between

Start by naming what you're avoiding, because the right design sits precisely between them.

Ungoverned autonomy is the failure mode of the optimists. An agent has write access to production, decides on its own that a fix is correct, and applies it. Sometimes it's right. The times it's wrong, you learn about it from a customer, a regulator, or a postmortem, because nothing required the agent to prove its reasoning before acting and nothing recorded who or what made the call. Remediation is the hardest, most consequential part of any automated system. Letting an agent perform it unsupervised isn't autonomy. It's an incident with a delayed timestamp.

Governance theater is the failure mode of the pessimists. Every change, regardless of risk, waits in the same review queue behind the same overloaded human. The reviewer, drowning, rubber-stamps to keep up. The control feels safe and protects nothing, because attention spread evenly across everything is attention paid to nothing. Worse, it trains your best engineers to route around it. Industry research puts the share of developers who bypass policy or guardrails near 80%. A control that gets evaded four times out of five is not a control.

The operating model has to be selective enough to avoid the bottleneck and rigorous enough to avoid the blind trust. That means it can't be a vibe. It has to be three concrete planes working together: policy, approval, and audit.

02

Plane one: policy decides what is allowed at all

Policy is the layer that encodes your organization's authority into something machine-readable before any agent proposes anything. It is the difference between "we generally try to review database migrations" and a rule that fires every time, identically, without a human remembering to invoke it.

Good policy is expressed in terms of blast radius, not surface signals. Lines changed and file count are poor proxies for danger. A six-hundred-line change to an isolated internal tool is safer than a three-line change to a shared authentication library that forty services depend on. To reason this way, policy needs a live model of the system underneath it. A System Graph that maps services, dependencies, and CI/CD into one change-aware model lets a policy ask the question that matters: does this change touch a node that fans out to critical paths, handles regulated data, or sits on the revenue path? Without that map, every policy is guessing.

Policy also defines the non-negotiables: the surfaces no agent may ever modify without explicit human authorization. Authentication, authorization, payments, regulated data flows, irreversible operations. This is the one list to be conservative about. Everything not on it becomes a candidate for governed automation.

03

Plane two: approval routes the proposal to the right authority

This is where "agents propose, humans authorize" stops being a tagline and becomes a routing decision. An agent assembles a change, runs validation against it, and produces a proposal. Approval decides what happens next, and the entire art is matching the rigor of the gate to the risk of the change.

A workable model tiers the decision by blast radius:

TierWhat it coversWho authorizes
Auto-mergeLow-criticality nodes, full validation, no policy-sensitive surfacesThe system, with a recorded decision
Notify and proceedModerate criticality, full coverageThe system, with an async alert to the owning team
Single approverHigh-criticality node, partial coverage, or a touched data pathOne named human
Change-controlRegulated data, auth, payments, irreversible ops, or any hard-policy failureTwo approvers or a change-advisory step

The goal is for the top two tiers to absorb the overwhelming majority of changes, so that human authorization concentrates on the genuinely dangerous minority. If most of your changes are landing in the bottom two tiers, the tiering is miscalibrated and you've rebuilt the bottleneck.

The decisive detail: the tier is derived, not declared. An author cannot label their own change low-risk to skip review. The system computes the tier from the graph and the policy. Coordinated Testing Fleets plan and execute validation that is aware of what changed and what depends on it, so the proposal reaching a gate already carries concrete evidence, which paths were exercised, what regressed, what reachability analysis says about exposure. That last point has real leverage: reachability-based prioritization, asking whether a flaw sits on a path actually reachable in your deployed system, can mean 70 to 90% less exploitable exposure to triage. A vulnerability in unreachable code doesn't have to block a release; a reachable one routes straight to change-control. When the hard work of fixing is involved, governed Remediation Fleets propose the change and stage it. They do not authorize themselves into production.

04

Plane three: audit makes the decision defensible later

The output of this operating model is not a green check. It's a record. For every change, you should be able to reconstruct what was proposed, what was validated, what the evidence showed, which tier it landed in, and who or what authorized it.

This matters most exactly where it's easiest to neglect: the auto-merged changes. The ones no human looked at are the ones whose audit trail nobody is watching, so the absence of a human in the path should raise the bar on the evidence, not lower it. When changes execute inside a customer boundary or a regulated enclave, the requirement is stricter still. Edge Runners run as signed capsules and emit audit-ready evidence from inside the boundary, so the record survives a compliance review instead of living in a CI log someone can edit. This is what Governance means as engineering rather than as a policy PDF: policy, approval, and audit as first-class, queryable configuration.

05

The loop these three planes run

Policy, approval, and audit aren't a one-time gate. They run continuously, around a closed loop: understand the system, test against it, reproduce what fails, remediate under governance, verify the fix held. The loop is what keeps the operating model honest as the system changes underneath it. The stakes for getting it wrong keep rising, roughly 41% of codebases are now AI-generated, around 45% of AI coding tasks introduce a critical flaw or security issue, and the cost of poor software quality runs near $2.41 trillion. Generation got cheap. Accountable validation did not.

06

What to do Monday morning

You don't need a platform migration to start operating this way. You need to make four things explicit.

  • Find the decision-maker. Ask who, or what, actually authorizes a release today. If the answer is one tired human reading five dashboards, you have a control gap.
  • Write your never-automate list. Name the surfaces that always require human authorization. Be conservative here and only here.
  • Derive risk, don't declare it. Replace author and file-count heuristics with blast-radius signals from a dependency model.
  • Demand evidence, not status. "Tests passed" is a status. "Here is what we tested, found, fixed, and who signed off" is evidence, and only the second one survives an audit.
07

The bottom line

أدلة ذات صلة

مواصلة القراءة

01Zof Console

سطح واحد للوضعية والعمليات وما يحتاج إلى الاهتمام بعد ذلك.

المنزل المُوثَّق الذي تفتحه فرق الهندسة وضمان الجودة وSRE كل يوم: وضعية الجودة، والتشغيل الجاري، والتغطية حسب الوحدة، وما يحتاج إلى الانتباه تاليًا.

مؤشرات الأداء الرئيسية التشغيلية

  • أشواط
  • تغطية
  • خطر

عش عبر كل بيئة تشحن إليها.

العمود الفقري للعمل

  • المواصفات
  • الاختبارات
  • الجداول

من المواصفات إلى الانحدار المجدول.

الدرابزين

  • RBAC
  • SSO
  • التدقيق

كل فعل ينسب إلى إنسان مسمى.

LIVE/console
يعرض مركز القيادة المنزلي Zof AI 12 عملية تشغيل بنسبة نجاح 94%، و3 مشكلات حرجة مفتوحة، وتغطية 84%، وأربعة أشرطة لتتبع الوحدات النمطية، ومسار المواصفات، والجداول الزمنية القادمة، والإجراءات التالية الموصى بها مع شريط جانبي للتشغيل النشط.
عرض الصفحة الرئيسية · خدمة الخروج · التدريج · تم التقاطها مباشرة من المنتج.
  • 01 · RUNS · 24H

    94% pass

    12 runs across staging

  • 02 · COVERAGE

    84%

    Across four modules

  • 03 · ACTIVE RUNS

    3 running

    Live on this branch

  • 04 · NEXT ACTIONS

    Recommended

    Triage gaps, new spec

Agents Propose, Humans Authorize: The Operating Model for AI in Produc