Bug Reproduction Benchmark
Transparent methodology for measuring governed agent fleets. Results published as available, framework pages labeled clearly when data is in progress. This page documents methodology; results are published when available.
What is measured
Rate of successful reproduction from issue description + telemetry to minimal reproducing path with evidence.
Reproduction cycle time dominates incident cost; fleets must shorten time-to-repro without unsafe actions.
Methodology
Curated incident narratives from sanitized production-like fixtures. Success requires reproducible steps, graph context attachment, and evidence bundle.
Limitations
Synthetic incidents may not capture organizational process constraints or proprietary tooling.
Request benchmark briefing
Transparent methodology for measuring governed agent fleets. Results published as available, framework pages labeled clearly when data is in progress.
