Sécurité et gouvernance

Kill Switches and Circuit Breakers: Designing Graceful Stand-Down for Reliability Agents

An SRE's guide to designing kill switches, circuit breakers, and graceful stand-down so reliability agents fail safe instead of failing open.

Book a demo

Équipe Fiabilité Zof · Ingénierie et produit

15 octobre 2025 · 8 min de lecture · Mis à jour le 15 octobre 2025

Résumé

An agent you can deploy but cannot stop is not an asset. It is an unbounded liability waiting for the wrong input. Reliability engineers already know this intuition from the physical world: the most important component in a substation is not the thing that closes the circuit, it is the protective relay that opens it. The same logic has to govern the agents now operating inside your software systems, and most teams have not built it yet. The industry is shipping autonomy faster than it is shipping the means to contain it. Roughly 41% of codebases are now AI-generated, and around 45% of AI coding tasks introduce a critical flaw or security issue. When an agent is also acting on the system, that defect rate is not a code-review problem anymore. It is an operational one. This guide is about the part nobody demos: how a reliability agent stands down cleanly when conditions go bad, so a failing agent degrades into a safe pause instead of a runaway.

In control engineering, a system is fail-safe when its failure mode is the safe state.
The train moves only while the operator is actively holding the control; release it, and the train brakes.
A circuit breaker is automatic and proportional, and it is the more important of the two for day-to-day operation.

Fail-safe versus fail-open is the whole game

In control engineering, a system is fail-safe when its failure mode is the safe state. A railway signal that loses power shows red. A pneumatic brake that loses pressure engages. The default-on-failure is the conservative one. Software, by contrast, tends to fail open: when the policy check times out, the request goes through; when the monitor crashes, the automation keeps running blind.

For a reliability agent, failing open is the dangerous default and it is usually the accidental one. Consider an agent authorized to restart unhealthy pods. Its health signal goes stale because the metrics pipeline is itself degraded. A fail-open agent keeps "remediating" against phantom data, restarting healthy pods into an outage it manufactured. A fail-safe agent treats a stale or untrustworthy input as a stop condition, not a green light.

The design rule is blunt: the absence of a positive, fresh authorization to continue must be treated as a command to stop. This inverts the usual default. You are not building a switch that turns the agent off. You are building an agent that is off unless something keeps telling it it is safe to be on.

The dead-man's switch: continuation requires a live heartbeat

Borrow the pattern from the locomotive cab. The train moves only while the operator is actively holding the control; release it, and the train brakes. Encode the same dependency into agent execution.

Concretely, an agent's authority to keep acting should be leased, not granted once. Before each action, or each small batch of actions, the agent must reconfirm three things: its input signals are fresh and within sanity bounds, its policy lease is still valid, and no higher-priority stop has been asserted. If any check fails or simply does not answer in time, the agent stands down. No answer is a stop, not a retry-forever.

This is where the control layer earns its place. A live System Graph gives the agent a change-aware model of what it is acting on, so "is this input sane" is checked against real dependencies rather than a hardcoded threshold. Governance holds the lease and the policy that issues it. The agent does not own its own off switch, which is the entire point, an agent that can grant itself permission to continue has no meaningful stop.

Circuit breakers: trip on a pattern, not a single fault

A kill switch is binary and human-triggered. A circuit breaker is automatic and proportional, and it is the more important of the two for day-to-day operation. The breaker watches the agent's recent behavior and trips when a dangerous pattern emerges, before a human has even noticed.

Design your breakers around the failure modes that actually hurt:

Action-rate breaker. If the agent proposes or executes actions far faster than its historical baseline, trip. A remediation loop that suddenly wants to touch forty services in two minutes is either responding to a real mass event (which a human should see) or it is thrashing.
Repeated-remediation breaker. If the same fix is applied to the same target N times without the underlying symptom clearing, the fix is wrong. Stop reapplying it. This is the agent equivalent of a relay that refuses to re-close into a fault.
Novelty breaker. If the agent's proposed action falls outside the distribution of what it has done before, route to a human instead of executing. Unprecedented is not the same as wrong, but it is exactly the case where confidence is least calibrated.
Blast-radius breaker. If the cumulative scope of recent actions exceeds a budget, number of services touched, percentage of fleet affected, irreversibility, pause and require fresh authorization regardless of individual-action validity.

The breaker's job is not to be right about the diagnosis. It is to be conservative about the trajectory. A tripped breaker that turns out to be a false alarm costs you a human glance. A breaker that should have tripped and did not costs you the incident.

Graceful means clean state, not just stopped

Stopping abruptly can be its own failure. An agent killed mid-remediation can leave a half-applied config, a drained-but-not-restored node, or a lock it never released. Graceful stand-down means the agent always has a defined safe state it can return to, and it gets there before it goes quiet.

Three properties make stand-down graceful rather than merely sudden:

Bounded, reversible steps. The agent works in increments small enough that any single step can be rolled back. This is why Remediation Fleets operate as governed, staged changes rather than one large irreversible action. You cannot gracefully abort a step you cannot undo.
Checkpoint and rollback. Before each step, the agent records the prior state. On stand-down, it either completes the current bounded step or rolls back to the last checkpoint. There is no third option where it leaves the system in an undefined middle.
Handoff with context. When the agent stops, it hands a human the full picture: what it was doing, why it stopped, what state the system is in now, and what it would do next. A stand-down that drops the operator into a mystery is not graceful, it is abandonment.

This is also where the principle the whole category runs on becomes operational: agents propose, humans authorize. A stood-down agent has not failed at its job. It has correctly recognized the boundary of its authority and returned control to the people who hold it. That is the behavior you want to reward, not engineer away.

Make stand-down auditable, especially in regulated operations

In Energy and Utilities, the stop is a regulated event. When an automated system declines to act, or trips itself, an operator needs an evidence trail that holds up to a reliability authority's review. "The agent paused" is not an answer; "the agent paused at 02:14 because input freshness exceeded the 30-second bound on the SCADA-adjacent telemetry feed, here is the signed record" is.

For agents running inside a customer boundary or a sensitive enclave, this evidence cannot live in an editable log. Edge Runners execute as signed capsules and emit audit-ready evidence from inside the boundary, so the record of why an agent stood down survives an audit rather than depending on a CI log someone could alter. The stand-down and its justification become first-class facts, not folklore reconstructed after the fact.

A reachability lens sharpens the same evidence for security-driven stops. Reachability-based prioritization can mean 70 to 90% less exploitable exposure to act on, which means a breaker that trips on a reachable, exploitable condition is reacting to real risk, and the trail proves it was real, not theoretical.

What to do Monday morning

You do not need a new platform to start. You need to find where your automation fails open today.

Inventory your agents' failure defaults. For each automated action, ask: if its input signal goes stale or its policy check times out, does it stop or continue? Every "continue" is a fail-open you are tolerating.
Add one circuit breaker. Pick your highest-blast-radius automation and wrap it in a repeated-remediation or rate breaker. Trip to a human, not to a louder retry.
Define the safe state. For one remediation path, write down the exact state the agent returns to on stand-down, and test that it actually gets there when killed mid-step.
Prove the stop. Confirm that a stand-down produces a tamper-evident record an auditor would accept, not a log line.

The bottom line

Gouvernance de l'IA Autorisation humaine System Graph Flottes de remédiation Runners en périphérie

Guides associés

Governed AI remediation

Produit associé

Continuer la lecture

Sécurité et gouvernance

Agents Propose, Humans Authorize: A Reference Architecture for Governed Autonomy

A reference architecture for letting agents act on production safely: the four control surfaces, policy, approval, evidence, attribution, and how they wire into the loop.

Équipe Fiabilité Zof16 juin 20268 min de lecture

Sécurité et gouvernance

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Better code generation can't validate its own output. Why AI-written code needs a governed control layer that maps, tests, and proves every change.

Équipe Fiabilité Zof14 mai 20267 min de lecture

Sécurité et gouvernance

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

When 41% of your codebase has no author, the real risk isn't bugs, it's lost intent. How a System Graph restores the provenance AI-generated code strips away.

Équipe Fiabilité Zof5 mai 20267 min de lecture

Fail-safe versus fail-open is the whole game

The dead-man's switch: continuation requires a live heartbeat

Circuit breakers: trip on a pattern, not a single fault

Graceful means clean state, not just stopped

Make stand-down auditable, especially in regulated operations

What to do Monday morning

The bottom line

Continuer la lecture

Agents Propose, Humans Authorize: A Reference Architecture for Governed Autonomy

More Models Won't Save You: Why AI-Generated Code Needs a Control Layer, Not Smarter Autocomplete

Code Without Provenance: The Real Risk When 41% of Your Codebase Has No Author

Une surface pour la posture, les opérations et ce qui nécessite une attention particulière.