Produkt

Running Testing Fleets Inside a Bank's Secure Enclave with Edge Runners

How signed-capsule Edge Runners let Testing Fleets validate inside a bank's secure enclave, no inbound access, customer-controlled execution, audit-ready evidence.

Book a demo

Zof Reliability Team · Engineering & Produkt

21. Oktober 2025 · 6 Min. Lesezeit · Aktualisiert 21. Oktober 2025

Zusammenfassung

A bank's most sensitive workflows, payments, KYC onboarding, core banking integrations, live in segments that prohibit inbound vendor access and forbid protected applications from calling external models. That constraint is non-negotiable, and it is exactly where validation matters most. The signed-capsule deployment pattern lets coordinated Testing Fleets plan outside the boundary but execute *inside* it, leaving audit-ready evidence behind without ever punching a hole in the network. This is a BOFU walkthrough for security leads who have already accepted that AI-generated code needs continuous validation and now need to know, concretely, how it runs in an enclave without violating the controls they are accountable for.

Most validation tooling assumes one of two things: either your application reaches out to a SaaS control plane, or the vendor reaches in.
Edge Runners solve this by separating the system into three planes with a hard line between them.
A capsule is the artifact that crosses the boundary, and its properties are chosen for exactly that crossing.

The constraint that breaks most testing tools

Most validation tooling assumes one of two things: either your application reaches out to a SaaS control plane, or the vendor reaches in. Both are disqualifying in a regulated bank. Inbound vendor access widens the attack surface and complicates every audit. Runtime calls from a protected segment to an external model introduce an uncontrolled egress path and a data-residency problem your examiners will not accept.

The pressure to solve this is not academic. Industry research now puts AI-generated code at roughly 41% of codebases, and roughly 45% of AI coding tasks introduce a critical flaw or security issue. Meanwhile about 80% of developers admit to bypassing policy or guardrails under deadline. The volume and the defect rate are both climbing, and the highest-risk surfaces are precisely the ones you have walled off. You cannot validate what you have made unreachable, and "we'll test it in a lower environment that doesn't mirror prod" is how silent failures reach production.

So the real engineering question is not "can agents test our software", it is "can validation happen entirely under our control, with evidence our auditors recognize, and no new trust granted to anyone outside the boundary."

Split the planes: where intelligence ends and execution begins

Edge Runners solve this by separating the system into three planes with a hard line between them.

Intelligence Plane, Planning and generation. The System Graph models your services, dependencies, and CI/CD so validation is change-aware; from that, fleets generate test plans and assemble capsules. This runs in Zof Cloud, your private cloud, or on-prem, wherever your policy permits. Critically, it does not execute anything against protected applications.
Control Plane, Your governance surface. Policies, cryptographic signing, role-based approvals, and audit trails live here. This is where "agents propose, humans authorize" becomes a literal gate, not a slogan.
Execution Plane, Inside the protected segment. Local edge runners execute browser, API, and workflow checks against internal endpoints and write evidence to a customer-controlled store.

The asymmetry is the whole point. Intelligence about your system can be generated outside the boundary; execution against your system never leaves it. Sensitive data and runtime behavior stay inside, while the expensive, model-heavy planning happens where you allow it.

The signed capsule: an immutable, auditable unit of work

A capsule is the artifact that crosses the boundary, and its properties are chosen for exactly that crossing. A capsule is an immutable, versioned package containing the test plan, its dependencies, a manifest, content hashes, and the approval record that authorized it. It is not a script someone can edit in place. Once signed, it is fixed; any change produces a new version with a new hash and its own approval.

That immutability buys three things a security lead actually cares about:

Provenance. Every capsule traces to a specific plan, a specific System Graph state, and a specific human approver. There is no anonymous "the tool did something" in the log.
Integrity. The gateway verifies the signature before anything runs. A tampered or unsigned capsule does not execute. Period.
Reproducibility. Because the unit is versioned and hashed, a finding can be re-run from the same capsule months later and produce the same evidence. That matters for the Reproduce step of the loop and for any examiner who asks you to demonstrate a result.

Ad hoc scripts never get promoted into the enclave. Only signed, approved capsules do, which is a meaningfully stronger posture than "trusted CI job with broad credentials."

The transfer boundary, and why it only flows one way

Capsules reach the runners through a customer-controlled gateway, what the architecture calls the transfer boundary. This is the component your network team will scrutinize, so be precise about what it does and does not do.

The gateway verifies signatures, enforces policy, stages approved capsules, and logs every transfer as a discrete, auditable event. There are no silent syncs. What it does *not* require is inbound access from Zof to your core systems. The pattern is outbound-only and policy-controlled: the protected segment pulls approved work and reports evidence upstream per your rules, rather than the vendor reaching in. That single architectural decision, no inbound holes, is what makes this defensible in a zero-trust review and compatible with PAM-style controls on the runner's privileges.

Evidence follows the same discipline. By default, screenshots, logs, and reports stay in a local evidence store inside your boundary. Egress is optional, sanitized, and approved, never the default. If your policy is local-only evidence, the system honors local-only evidence.

Governance is the engineering, not the afterthought

The hardest part of autonomous reliability is not generating tests; it is fixing things safely. Unsupervised autonomous remediation inside a bank's enclave would be reckless, and Zof does not do it. Remediation Fleets can plan a fix where your policy allows planning, but a remediation only executes through the same signed-capsule, human-approval path as everything else. Agents propose; humans authorize. Governance, policy, approval, audit, is the substantive engineering here, because it is the difference between a useful control layer and a liability.

This is also where reachability-based prioritization earns its keep. Treating every flagged issue as equal floods your team and your auditors with noise. Prioritizing by what is actually reachable and exploitable can mean 70-90% less exploitable exposure to chase, which keeps human approval focused on the changes that genuinely move risk.

What to do Monday morning

You do not need to commit the whole bank to prove the pattern. A conservative pilot path looks like this:

Pick one bounded, high-value workflow, say, a hypothetical payments-confirmation flow, and scope it to a single protected segment.
Stand up one edge runner in that segment and configure the gateway for outbound-only, signature-verified transfer with local-only evidence.
Run in observe mode first. Let fleets execute signed capsules and produce evidence with no remediation enabled. You are validating the control story before the value story.
Have your audit team review the evidence trail, not the demo. The question is whether the capsule manifests, hashes, approval records, and transfer logs satisfy *your* examiners.
Then, and only then, enable governed remediation on a narrow scope, with human approval mandatory.

If you want the deployment reference and the boundary diagram before you brief your network team, the secure enclave architecture page and the financial services solution walk through the same planes in detail.

The bottom line

Testing Fleets Software-Testing System Graph Remediation Fleets Edge Runners

Verwandte Leitfäden

System Graph for reliability

Verwandtes Produkt

Lesen Sie weiter

Produkt

Inside a Testing Fleet: How Coordinated Agents Plan, Execute, Observe, and Maintain Validation

An anatomy of the testing fleet: how coordinated agents plan, execute, observe, and maintain validation as a continuous loop instead of a one-shot test run.

Zof Reliability Team23. Juni 20267 Min. Lesezeit

Produkt

The 2026 State of Autonomous Remediation: From Suggestion to Governed Fix

Autonomous remediation is the next frontier beyond test generation. Why governed fixing, not unsupervised autonomy, is the only version enterprises will adopt in 2026.

Zof Reliability Team18. Juni 20267 Min. Lesezeit

Produkt

Rollback-First Remediation: Designing Fixes You Can Always Undo

Safe autonomous fixing means every change ships with a pre-validated undo path. A platform engineer's guide to rollback-first remediation patterns and the autonomy they unlock.

Zof Reliability Team28. Mai 20268 Min. Lesezeit

The constraint that breaks most testing tools

Split the planes: where intelligence ends and execution begins

The signed capsule: an immutable, auditable unit of work

The transfer boundary, and why it only flows one way

Governance is the engineering, not the afterthought

What to do Monday morning

The bottom line

Lesen Sie weiter

Inside a Testing Fleet: How Coordinated Agents Plan, Execute, Observe, and Maintain Validation

The 2026 State of Autonomous Remediation: From Suggestion to Governed Fix

Rollback-First Remediation: Designing Fixes You Can Always Undo

Eine Oberfläche für Körperhaltung, Operationen und alles, was als nächstes Aufmerksamkeit erfordert.