Seven tasks. May 29, 2026 weighted to 32.2x leverage across 94.0 human-equivalent hours in 175 Claude-minutes. Supervisory leverage closed at 313.3x.
2.4 weeks of human-equivalent throughput in 2.9 hours of Claude wall-clock. The 80.0x ceiling came from A security-scanning service Phase 1; scanner foundation (Scanner ABC + Finding model + content-addressable dedup hash + lazy registry) plus all 7 scanner integrations (semgrep/band...; the 9.6x floor sat at Origin-service rebuild finish; retired subprocess paths (orchestrator in-process + JobStore-backed synthesis/tribunal APIs + deleted SynthesisManager/TribunalManager) + standalone....
Task Log
| # | Task | Human Est. | Claude | Sup. | Factor | Sup. Factor |
|---|---|---|---|---|---|---|
| 1 | A security-scanning service Phase 1; scanner foundation (Scanner ABC + Finding model + content-addressable dedup hash + lazy registry) plus all 7 scanner integrations (semgrep/bandit/detect-secrets/pip-audit/safety/checkov/trivy) each with wrapper + normalizer + severity-CWE mapping + seeded-vuln fixture + golden + full test suite; built Semgrep inline as the template then fanned out the other 6 via a parallel workflow; fixed a scanner-normalizer import cycle via lazy discovery; full suite 174 passed / 2 skipped / 99% coverage; committed locally | 60.0h | 45m | 1m | 80.0x | 3600.0x |
| 2 | Root-caused certification-content quality instability (42 cloud certification packages, 3 repair campaigns) + committed 301 engine re-stamps + regenerated 121 single-option questions for one certification domain | 6.0h | 10m | 3m | 36.0x | 120.0x |
| 3 | Synthesis field-preservation remediation (task 1/7): raw exam_metadata passthrough + top-level fields across an origin service (runtime and service packages) so synthesis stops dropping spec fields (root cause of spec<->manifest drift); 5 files/2 repos; 607 tests green | 4.5h | 13m | 2m | 20.8x | 135.0x |
| 4 | Remediation (contained): content_profile.lessons gating + IRT-based adaptive question selection (2PL Fisher info replacing random.choice; helper + 3 tests; 4508 engine tests green); committed on staging | 4.0h | 15m | 2m | 16.0x | 120.0x |
| 5 | Rename an origin service across the monorepo: runtime and service packages + 2 repo dirs + CLI rename + path-deps/lockfiles across 8 repos; infra deferred; runtime 61 / service 546 / engine 4508 suites green; committed on staging | 9.0h | 35m | 6m | 15.4x | 90.0x |
| 6 | Origin-service rebuild; shared in-process pipelinecore (runphase/runpipeline) + made lessons/questions/tribunal stub runners real + writequestion_bank + pipeline job kind + pipeline run CLI; 16 new tests; service suite 561 green; committed on staging | 7.0h | 35m | 3m | 12.0x | 140.0x |
| 7 | Origin-service rebuild finish; retired subprocess paths (orchestrator in-process + JobStore-backed synthesis/tribunal APIs + deleted SynthesisManager/TribunalManager) + standalone CLI mode (local flags); net -415 LOC; service suite 564 green; committed on staging | 3.5h | 22m | 1m | 9.6x | 210.0x |
Aggregate Statistics
| Metric | Value |
|---|---|
| Total tasks | 7 |
| Total human-equivalent hours | 94.0 |
| Total Claude minutes | 175 |
| Total supervisory minutes | 18 |
| Total tokens | 1,398,000 |
| Weighted average leverage factor | 32.2x |
| Weighted average supervisory leverage factor | 313.3x |
| Human-equivalent weeks | 2.4 |
Analysis
The day's leverage distribution matters more than the headline figure. The 80.0x ceiling came from A security-scanning service Phase 1; scanner foundation (Scanner ABC + Finding model + content-addressable dedup hash + lazy registry) plus all 7 scanner integr...; the 9.6x floor was Origin-service rebuild finish; retired subprocess paths (orchestrator in-process + JobStore-backed synthesis/tribunal APIs + deleted SynthesisManager/TribunalMa.... Tasks at the top of the distribution share a shape: tightly-scoped specifications, clear success criteria, and minimal integration ambiguity. The AI doesn't need to discover anything new; it executes against an explicit target.
Tasks at the bottom run differently. They're either bounded by review-heavy work where every step gets verified, or they involve ambiguity that demands several rounds of trial and adjustment. The factor is real and informative, not a failure mode.
The supervisory leverage figure (313.3x today) tracks something orthogonal to wall-clock leverage. It's the ratio of human-equivalent output to human prompt-writing time. It stays high even on lower-leverage days because supervisory minutes scale with task count, not with the human-hour estimate; a 20-minute task and a 4-hour task can both be specified in two minutes of human prompt-writing.
Across the 7 tasks, the day produced roughly 2.4 weeks of senior-engineer-equivalent throughput in 2.9 hours of model wall-clock. That ratio is the practical answer to the question of how much output a single operator can move per day when the model handles the execution and the operator handles the direction.