Seguridad y gobernanza

Agents Propose, Humans Authorize: A Reference Architecture for Governed Autonomy

A reference architecture for letting agents act on production safely: the four control surfaces, policy, approval, evidence, attribution, and how they wire into the loop.

Book a demo

Equipo de Fiabilidad de Zof · Ingeniería y producto

16 de junio de 2026 · 8 min de lectura · Actualizado 16 de junio de 2026

Resumen

Agents are already proposing changes to your production systems. The question that decides whether that ends in velocity or in a postmortem is architectural: what control surfaces sit between an agent's proposal and a live deployment, and can you prove they held? Most teams have one or two of these surfaces, bolted on ad hoc. This is the reference architecture for all four. There is no canonical design for governed autonomy yet. We have plenty of agents that can write and test code, and almost no shared vocabulary for the machinery that lets them act on real systems without ceding control. The result is that every team reinvents it badly, a Slack approval here, a CI log there, a policy doc nobody reads. This piece defines the four control surfaces that a serious autonomy architecture needs, and how they wire into the closed loop so they reinforce each other instead of fighting.

The pressure is structural, not hypothetical.
Policy is the first gate because it is the cheapest.
Approval decides authority: which human, by role, signs off on a given class of change.

Why this needs an architecture, not a feature

The pressure is structural, not hypothetical. Roughly 41% of codebases are now AI-generated, and industry research suggests around 45% of AI coding tasks introduce a critical flaw or security issue. The cost of poor software quality sits near $2.41 trillion. You are not deciding whether to admit fast, occasionally-wrong systems into your change pipeline. They are already there, and they generate change faster than human review can absorb.

The common response is to reach for a better model or sharper prompts. That is the wrong layer. A better model still produces a non-trivial defect rate, and you cannot audit a probability distribution. What an enterprise needs is not more intelligence inside the agent, it needs a control layer around it. As we have argued in the control-layer thesis, AI is missing a control layer, not more models.

A control layer is not one feature. It is a small number of surfaces that every autonomous action must pass through, each answering a distinct question:

Policy, *is this action allowed at all?*
Approval, *who authorizes it, and on what evidence?*
Evidence, *what proves the change is safe?*
Attribution, *who or what did this, and can we prove it later?*

Miss any one and the others degrade. Approval without evidence is a rubber stamp. Evidence without attribution is unprovable under audit. Policy without approval is advisory, and advisory governance gets bypassed, about 80% of developers admit to routing around guardrails when those guardrails slow them down.

Surface 1: Policy, the boundary of allowed action

Policy is the first gate because it is the cheapest. It answers a yes/no question before any work happens: is an agent permitted to touch this surface, in this environment, under these conditions? Everything downstream assumes the action was admissible to begin with.

The failure mode here is policy-as-prose. When the rules live in a wiki, a runbook, or a quarterly change-advisory meeting, they are unenforceable at machine speed and they get skipped at exactly the moment they matter. Policy has to be code, evaluated inline, with no path around it.

Two design choices make policy real. First, express authority along axes that map to risk, not to surface signals like lines changed or file count: blast radius, data sensitivity, environment, and reachability. A three-line change to a shared authentication library is more dangerous than a 600-line change to an isolated internal tool, and only a policy that reasons about the dependency graph knows that. This is why policy cannot be evaluated from the diff alone, it needs the System Graph, a live map of services, dependencies, and CI/CD that tells the policy engine what a change actually touches. Second, default agents to propose-only. They may plan, generate, and stage a change, but moving it into a protected environment is a separate, governed event. That separation is the architecture's backbone.

Surface 2: Approval, encoding who authorizes

Policy decides admissibility. Approval decides authority: which human, by role, signs off on a given class of change. This is where the governing principle lives, agents propose, humans authorize.

The trap is uniform approval. If a copy tweak and a settlement-logic change wait in the same queue with the same reviewer and SLA, the reviewer rubber-stamps both to keep up, and the one change that mattered slips through with the same glance as the rest. A gate that approves everything protects nothing.

The architecture's answer is to tier approval by blast radius, derived from the System Graph rather than self-declared by the author:

Auto-merge for low-criticality nodes with full validation coverage and no policy-sensitive surface. No human gate; the evidence record is the authority.
Notify-and-proceed for moderate-criticality changes with complete coverage, auto-merges but posts an audit entry to the owning team.
Single approver for high-criticality nodes or partial coverage.
Multi-party / change-control for regulated data, auth, payments, and irreversible operations.

The objective is for the first two tiers to absorb the overwhelming majority of changes, so human attention concentrates on the genuinely dangerous minority. We go deeper on calibration in designing approval gates that don't become bottlenecks. Get tiering wrong and you have rebuilt the original queue with extra steps.

Surface 3: Evidence, what makes approval defensible

An approval is only as trustworthy as the evidence behind it. The reason most teams cannot safely auto-merge is that they have no defensible proof a change is good, so they fall back to a human eyeball as a weak substitute for a test that should have run.

Invert that. Every proposal should arrive at the gate already carrying the evidence the gate needs: which paths were exercised, what regressed, whether the original failing behavior was reproduced before the fix, and what reachability analysis says about exposure. That last point sharpens security gates specifically, reachability-based prioritization, asking whether a flaw sits on a path actually reachable in your deployed system, can mean 70 to 90% less exploitable exposure to triage. An unreachable flaw need not block a release; a reachable one routes straight to the strictest tier.

Evidence is not a static suite passing. It is validation that knows what changed and what depends on it. That is the job of coordinated Testing Fleets, agents that plan, execute, and maintain change-aware validation as the system evolves, rather than scripts that ignore the dependency graph and quietly rot. Watch for coverage laundering: a change that shows "tests passed" while the validation never exercised the changed path. The gate must read coverage *of the change*, not aggregate green.

Surface 4: Attribution, proving who did what

Attribution is the surface most teams forget, and it is the one that decides whether the other three survive an audit. It answers the question an examiner actually asks: can you prove that *this specific change* was authorized by someone permitted to authorize it, on evidence that existed *before* approval, and that no control was bypassed?

"Do you have logs" is the wrong test. Logs can be edited and they decouple the proposal from the evidence from the approval. Attribution requires those to be a single immutable, linked artifact: the proposal, the validation evidence, the System Graph context at the moment of decision, and the authorization (or rejection), bound together. The trail should be a byproduct of how the system runs, not a reconstruction project that starts when the examiner arrives.

Two constraints raise the bar. Auto-merged changes, the ones no human watched, need the *same* evidence record as reviewed ones; the absence of a human in the path strengthens the attribution requirement, it does not relax it. And when data cannot leave your boundary, attribution still has to hold: Edge Runners execute as signed capsules inside a secure enclave and emit audit-ready evidence outward, so the proof comes to you while the data stays put.

How the four wire into the loop

These surfaces are not a checklist run once. They map onto the closed loop, Understand, Test, Reproduce, Remediate, Verify, and each loop pass writes to each surface. *Understand* is the System Graph feeding policy. *Test* and *Reproduce* generate evidence. *Remediate* is the propose step that approval gates, and remediation is the hardest, most critical surface to govern, which is exactly why letting agents fix code unsupervised is reckless and the approval machinery is the engineering. *Verify* confirms the change held and closes the attribution record. Remove any surface and the loop leaks: changes act without admissibility, approve without proof, or land without a trail.

The bottom line

Gobernanza de IA Autorización humana System Graph Flotas de pruebas Flotas de remediación

Guías relacionadas

Governed AI remediation

Producto relacionado

Continuar leyendo

Seguridad y gobernanza

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Better code generation can't validate its own output. Why AI-written code needs a governed control layer that maps, tests, and proves every change.

Equipo de Fiabilidad de Zof14 may 20267 min de lectura

Seguridad y gobernanza

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

When 41% of your codebase has no author, the real risk isn't bugs, it's lost intent. How a System Graph restores the provenance AI-generated code strips away.

Equipo de Fiabilidad de Zof5 may 20267 min de lectura

Seguridad y gobernanza

The Audit Trail Is the Product: Evidence-Grade Logging for Autonomous Agents

Why the audit trail is the primary system of record for autonomous agents in fintech, and how to make it evidence-grade: attributable, complete, and tamper-evident.

Equipo de Fiabilidad de Zof29 abr 20268 min de lectura

Why this needs an architecture, not a feature

Surface 1: Policy, the boundary of allowed action

Surface 2: Approval, encoding who authorizes

Surface 3: Evidence, what makes approval defensible

Surface 4: Attribution, proving who did what

How the four wire into the loop

The bottom line

Continuar leyendo

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

The Audit Trail Is the Product: Evidence-Grade Logging for Autonomous Agents

Una superficie para la postura, las operaciones y lo que necesita atención a continuación.