Benchmark framework · results pending

Remediation Fleet Benchmarks

Measure reproduction speed, root-cause quality, fix proposal safety, and validation reliability, without publishing unverified fix success rates.

Run a reliability assessment Review benchmark methodology

Benchmark framework, results pending. Methodology and measurement definitions are published; performance numbers appear only after completed runs.

Why this benchmark matters

Remediation fleets only create enterprise value if they shorten incident cycles while respecting approval policy. Buyers need benchmarks that score governance and verification, not auto-merge hype.

Metrics measured

What this suite tracks

minutes

Time to reproduce bug

Wall-clock from incident signal to minimal reproducing path with evidence.

minutes

Time to root-cause

Time to attach graph-backed hypothesis with supporting telemetry.

minutes

Time to generate candidate fix

Time to staged proposal with diff, tests, and rollback plan.

minutes

Human approval cycle time

Elapsed time in approval queue excluding engineer idle time.

rate

Fix validation success rate

Share of approved fixes that pass verify-after-fix suites.

rate

Rollback / verification reliability

Successful rollback or verification when validation fails.

Methodology

How we measure

Success requires reproducible steps, graph context, staged proposals, recorded approvals, and verify-after-fix execution. Policy violations fail the run regardless of fix quality.

Test environment	Sanitized production-like fixtures with injected defects, policy engine enabled, staging deploy target, evidence store, and approval workflow mirroring enterprise defaults.
Dataset / workload	Curated incident narratives spanning UI regressions, API contract breaks, race conditions, and config drift. Adversarial scenarios test policy bypass attempts.
Sample size	Minimum 25 incidents × 2 policy profiles (to be confirmed at first run).
Number of runs	3 attempts per incident with fixed seeds; failures classified by phase (repro, RCA, proposal, verify).
Variance	Not yet measured. Future runs will report p50, p95, and coefficient of variation.
Excluded runs	None defined until first benchmark run is completed.
Date last run	Pending first benchmark run
Version tested	Pending first benchmark run
Repeatability	Incident pack version, policy hash, and agent versions are pinned. Evidence bundles export for third-party replay.

Assumptions

-No auto-apply without explicit approval in benchmark profile.
-Verify-after-fix runs use the same fleet configuration as detection.
-Synthetic incidents may omit org-specific runbooks.

Results

Results pending first benchmark run

This page does not display performance numbers until completed runs pass validation. When published, results include confidence ranges and sample sizes.

Metric	Value	Confidence range	Notes
Time to reproduce bug	Pending	-	Awaiting completed runs
Time to root-cause	Pending	-	Awaiting completed runs
Time to generate candidate fix	Pending	-	Awaiting completed runs
Human approval cycle time	Pending	-	Awaiting completed runs
Fix validation success rate	Pending	-	Awaiting completed runs
Rollback / verification reliability	Pending	-	Awaiting completed runs

Limitations

What this benchmark does not claim

-Synthetic incidents may not capture proprietary tooling or change-management constraints.
-Lab policies may differ from your production policy set; map controls during architecture review.
-Until results are published, no fix success rates or speedup percentages are stated.

Enterprise interpretation

Evaluate whether remediation fleets compress reproduction and RCA time while keeping humans in control. Published metrics will separate proposal quality from approval latency.

Continue your evaluation

Product

Guides

Next steps

Evaluate Zof against your reliability requirements

Review methodology, run a structured assessment, or benchmark against your workflow with enterprise architects.

Run a reliability assessment Benchmark Zof against your workflow Review benchmark methodology Talk to an enterprise architect

Remediation Fleet Benchmarks

What this suite tracks

How we measure

Assumptions

Results pending first benchmark run

What this benchmark does not claim

Enterprise interpretation

Continue your evaluation

Evaluate Zof against your reliability requirements

One surface for posture, operations, and what needs attention next.