Two tasks. June 18, 2026 weighted to 36.8x leverage across 52.0 human-equivalent hours in 85 Claude-minutes. Supervisory leverage closed at 624.0x.
1.3 weeks of human-equivalent throughput in 1.4 hours of Claude wall-clock. The 38.5x ceiling came from Full content audit (audit_specs.py + content-audit.py, 24 phases, 289 pkgs/592 specs) + mechanical fixes: canonical reconcile, 4 content-domain specs completed, domain docs refresh...; the 36.4x floor sat at Full deployment-readiness audit (36 changed repos via 13 fan-out agents) + applied safe fixes across 19 repos + doc/baseline updates.
Task Log
| # | Task | Human Est. | Claude | Sup. | Factor | Sup. Factor |
|---|---|---|---|---|---|---|
| 1 | Full content audit (audit_specs.py + content-audit.py, 24 phases, 289 pkgs/592 specs) + mechanical fixes: canonical reconcile, 4 content-domain specs completed, domain docs refreshed | 12.0h | 19m | 2m | 38.5x | 360.0x |
| 2 | Full deployment-readiness audit (36 changed repos via 13 fan-out agents) + applied safe fixes across 19 repos + doc/baseline updates | 40.0h | 66m | 3m | 36.4x | 800.0x |
Aggregate Statistics
| Metric | Value |
|---|---|
| Total tasks | 2 |
| Total human-equivalent hours | 52.0 |
| Total Claude minutes | 85 |
| Total supervisory minutes | 5 |
| Total tokens | 2,250,000 |
| Weighted average leverage factor | 36.8x |
| Weighted average supervisory leverage factor | 624.0x |
| Human-equivalent weeks | 1.3 |
Analysis
The day's leverage distribution matters more than the headline figure. The 38.5x ceiling came from Full content audit (audit_specs.py + content-audit.py, 24 phases, 289 pkgs/592 specs) + mechanical fixes: canonical reconcile, 4 content-domain specs completed,...; the 36.4x floor was Full deployment-readiness audit (36 changed repos via 13 fan-out agents) + applied safe fixes across 19 repos + doc/baseline updates. Tasks at the top of the distribution share a shape: tightly-scoped specifications, clear success criteria, and minimal integration ambiguity. The AI doesn't need to discover anything new; it executes against an explicit target.
Tasks at the bottom run differently. They're either bounded by review-heavy work where every step gets verified, or they involve ambiguity that demands several rounds of trial and adjustment. The factor is real and informative, not a failure mode.
The supervisory leverage figure (624.0x today) tracks something orthogonal to wall-clock leverage. It's the ratio of human-equivalent output to human prompt-writing time. It stays high even on lower-leverage days because supervisory minutes scale with task count, not with the human-hour estimate; a 20-minute task and a 4-hour task can both be specified in two minutes of human prompt-writing.
Across the 2 tasks, the day produced roughly 1.3 weeks of senior-engineer-equivalent throughput in 1.4 hours of model wall-clock. That ratio is the practical answer to the question of how much output a single operator can move per day when the model handles the execution and the operator handles the direction.