Sécurité et gouvernance

The Governed-Autonomy Readiness Checklist for Regulated Industries

A pre-deployment checklist for compliance and risk officers evaluating governed autonomous agents in healthcare: policy-as-code, scoped permissions, signed capsules, attribution, and a kill switch.

Book a demo

Équipe Fiabilité Zof · Ingénierie et produit

21 avril 2026 · 8 min de lecture · Mis à jour le 21 avril 2026

Résumé

Autonomous agents are about to touch systems that hold protected health information, and the people who will answer for that in an audit are rarely the ones running the proof of concept. If you sign off on a control layer that lets agents propose and apply changes near clinical or claims systems, you own the consequences when an examiner asks what ran, who authorized it, and whether you can prove the control held. This checklist is what to verify before that signature, not after the incident. The framing matters. A serious healthcare enterprise does not want more AI in its pipeline; it wants control over the AI already there. Roughly 41% of codebases are now AI-generated, and around 45% of AI coding tasks introduce a critical flaw or security issue. The change is happening with or without your governance program. The only real decision is whether the autonomy operating in your environment is governed or not. Governed autonomy means agents propose and humans authorize, with the authority encoded into the system rather than living in a policy document nobody reads at 2 a.m. Below are five gates. Treat each as pass or fail. A vendor that cannot demonstrate all five against your boundary is not ready for a regulated deployment, no matter how good the demo looked.

The first question is where authority actually lives.
Ungoverned automation collapses two acts that your auditors expect to stay separate: proposing a change and authorizing it.
Most AI-does-the-testing stories fail audit for the same reason.

1. Policy-as-code, not policy-as-meeting

The first question is where authority actually lives. If your guardrails are a wiki page, a change-advisory-board slot, and a reviewer's good intentions, they will be bypassed. This is not a hypothetical: roughly 80% of developers admit to routing around policy and guardrails when those controls slow them down. That is rarely malice. It is friction. Any authority model that depends on a human remembering to follow a document fails at exactly the moment it matters.

For a control layer to be defensible, policy has to be executable and evaluated on every change, automatically, before anything reaches a protected environment.

Verify:

Policies are expressed as code and version-controlled, with a history you can diff and attribute.
Every proposed change is evaluated against policy automatically. There is no path where an agent's action reaches a PHI-adjacent system without a policy check.
Policy decisions are change-aware. The engine knows whether a given change touches a clinical data store, a claims pipeline, or a logging label, and treats them differently. This is the role a live dependency map plays. Zof's System Graph gives the policy engine the context to know what a change actually reaches, so the same change is judged the same way every time.
The same policy applies to humans and agents. A control that only governs the robots is theater.

The test to run: ask the vendor to show you a policy denying a class of change, then watch an agent attempt that change and get stopped. If they can only describe it, it does not exist yet.

2. Scoped permissions and the maker-checker split

Ungoverned automation collapses two acts that your auditors expect to stay separate: proposing a change and authorizing it. An agent that writes a fix and applies it has merged the maker and the checker. In a regulated setting, that single property can turn a routine change into a finding.

Scoped permissions mean an agent's default authority is the narrowest it can be and still do useful work. Agents should plan, generate, test, and stage, producing a complete proposal with evidence attached. They should not be able to move that proposal into a protected environment on their own. The transition from proposed to authorized is a separate, policy-governed event with a named, role-appropriate human on the other side.

Verify:

Agents run propose-only by default for any path that touches PHI, clinical workflows, or regulated reporting.
Permissions are scoped to specific services, data stores, and environments through allowlists tied to identity, not broad service accounts.
Separation of duties is enforced by the system: whoever proposed a change cannot authorize it.
Approvals are risk-tiered so the gate does not become a rubber stamp. Low-blast-radius, high-confidence changes on non-regulated paths can move quickly under policy; anything touching reachable, regulated, or patient-facing paths requires explicit human authorization; low-confidence or boundary-crossing changes escalate.

The point of tiering is to spend scarce human attention on the genuinely risky minority. Reachability-based prioritization, focusing on what is actually exploitable rather than every theoretical issue, can mean 70 to 90% less exploitable exposure, which is also what makes it safe to let governed automation handle the low-risk long tail. Zof's Governance surface and approval model are built around this split.

3. Signed capsules: a stable artifact, not a runtime improvisation

Most AI-does-the-testing stories fail audit for the same reason. The thing that ran was synthesized at runtime and is gone afterward. There is no stable artifact to review, no signature to verify, no way to prove that what executed near your clinical systems was the thing a human approved and nothing else.

A signed capsule inverts that. It is an immutable, versioned, approved package with a constrained manifest that defines exactly what may run. The work is assembled and reviewed before it can execute, signed, promoted through versioning, and only then admitted to run. The manifest is the scope, the signature is the attestation, and the version is the chain of custody. This is the unit of work behind Zof's Edge Runners, which execute inside your boundary and emit audit-ready evidence outward.

Verify:

The unit of execution is a signed, versioned artifact, not an ad hoc script generated on the fly.
Each capsule carries a manifest scoping exactly what it may touch, and nothing outside that manifest can execute.
Capsules are promoted through versioned stages with approval, and you can reproduce any past execution from its signed artifact.
For your most sensitive segments, execution and evidence can stay local. You decide what leaves the boundary, if anything. This is essential where PHI cannot transit to a vendor's cloud; see the secure-enclave deployment model.

The test to run: ask to see the exact artifact that executed in a prior run, its signature, and who approved it. Audit-readiness is the ability to answer that in minutes.

4. Attribution: the audit trail as a byproduct

For a compliance officer, the deciding question is usually the last one an examiner asks. When a change went live near regulated data, can you prove who authorized it, on what evidence, and that the control was not bypassed? If answering takes a forensic reconstruction, you do not have attribution. You have logs.

Real attribution means every proposal, every piece of validation evidence, every approval and rejection, and the system context at the moment of decision are linked into one immutable record. The trail is a byproduct of how the system runs, not a separate logging effort bolted on later.

Verify:

Every agent action carries an identity. There are no anonymous or shared-credential operations near regulated systems.
The proposal, the validation evidence, and the authorization are one linked artifact, not three systems you correlate by hand.
Evidence existed before approval. You can show the validation that the authorizer saw, at the version they saw it.
The full chain exports cleanly to your GRC tooling and to a regulator, without a custom project to assemble it.

Zof's closed loop, Understand, Test, Reproduce, Remediate, Verify, produces this evidence at each stage because validation runs through coordinated Testing Fleets rather than static scripts that leave no defensible record. The reproduce step matters in healthcare specifically: proving a failing behavior was reproduced before a fix is what separates a defensible change from a hopeful one.

5. Kill switch: governed remediation you can stop

Remediation is the hardest and most consequential part of the loop, which is exactly why it must be the most governed. Letting agents fix code near clinical systems unsupervised is reckless. Governance is the engineering here, not a feature added later, and a credible kill switch is part of that engineering.

A kill switch is more than a stop button. It is the assurance that you can halt autonomous action, scope the halt precisely, and recover safely without manual archaeology.

Verify:

You can halt agent activity globally and per scope (one service, one environment, one capsule class) without taking down the whole platform.
A halt is itself a governed, attributed event, with a named authority and a record.
Rollback is verified before any change is considered closed, and you can prove the rollback worked.
Emergency paths still require named approvers. There is no break-glass mode that quietly removes oversight.

Governed Remediation Fleets operate under this constraint by design: agents propose the fix, humans authorize it, and you retain the ability to stop and reverse. Autonomy here is governed, not unsupervised.

What to do Monday morning

You do not need a year-long program to start. A conservative sequence:

Inventory where agents already act. Find every place an automated tool can write to a PHI-adjacent environment today.
Run validation, not remediation, first. Let the loop prove out under human authorization before granting any governed fixing.
Pilot in local-only evidence mode. Prove value with zero egress before deciding what, if anything, leaves the boundary.
Demand the five demonstrations above. Make the vendor show each gate working, not describe it.

The bottom line

Gouvernance de l'IA Autorisation humaine System Graph Flottes de test Flottes de remédiation

Guides associés

Governed AI remediation

Produit associé

Continuer la lecture

Sécurité et gouvernance

Agents Propose, Humans Authorize: A Reference Architecture for Governed Autonomy

A reference architecture for letting agents act on production safely: the four control surfaces, policy, approval, evidence, attribution, and how they wire into the loop.

Équipe Fiabilité Zof16 juin 20268 min de lecture

Sécurité et gouvernance

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Better code generation can't validate its own output. Why AI-written code needs a governed control layer that maps, tests, and proves every change.

Équipe Fiabilité Zof14 mai 20267 min de lecture

Sécurité et gouvernance

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

When 41% of your codebase has no author, the real risk isn't bugs, it's lost intent. How a System Graph restores the provenance AI-generated code strips away.

Équipe Fiabilité Zof5 mai 20267 min de lecture

1. Policy-as-code, not policy-as-meeting

2. Scoped permissions and the maker-checker split

3. Signed capsules: a stable artifact, not a runtime improvisation

4. Attribution: the audit trail as a byproduct

5. Kill switch: governed remediation you can stop

What to do Monday morning

The bottom line

Continuer la lecture

Agents Propose, Humans Authorize: A Reference Architecture for Governed Autonomy

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

Une surface pour la posture, les opérations et ce qui nécessite une attention particulière.