Two tasks. June 7, 2026 weighted to 24.0x leverage across 36.0 human-equivalent hours in 90 Claude-minutes. Supervisory leverage closed at 240.0x.
0.9 weeks of human-equivalent throughput in 1.5 hours of Claude wall-clock. The 28.4x ceiling came from Content remediation: strict content-audit completeness FAIL gates + remediation_inventory.py (prioritized, 0/289 pass strict bar) + duplicate MCQ option-text code-defect fix via me...; the 17.1x floor sat at Executed + verified cloud question backfill: 9095 schema-faithful MCQs across 42 content packages, node coverage 77.8%->99.98% (10 degenerate skipped), 0 dup options, peak RSS 282M....
Task Log
| # | Task | Human Est. | Claude | Sup. | Factor | Sup. Factor |
|---|---|---|---|---|---|---|
| 1 | Content remediation: strict content-audit completeness FAIL gates + remediation_inventory.py (prioritized, 0/289 pass strict bar) + duplicate MCQ option-text code-defect fix via memory-bounded third-party-model regen (189 options/36 cloud pkgs, 82% distractor==correct, 0 residual, peak 178MB) + memory-bounded question backfill generator for 9108 uncovered nodes (smoke-verified) | 26.0h | 55m | 6m | 28.4x | 260.0x |
| 2 | Executed + verified cloud question backfill: 9095 schema-faithful MCQs across 42 content packages, node coverage 77.8%->99.98% (10 degenerate skipped), 0 dup options, peak RSS 282MB, run records committed+pushed to staging | 10.0h | 35m | 3m | 17.1x | 200.0x |
Aggregate Statistics
| Metric | Value |
|---|---|
| Total tasks | 2 |
| Total human-equivalent hours | 36.0 |
| Total Claude minutes | 90 |
| Total supervisory minutes | 9 |
| Total tokens | 950,000 |
| Weighted average leverage factor | 24.0x |
| Weighted average supervisory leverage factor | 240.0x |
| Human-equivalent weeks | 0.9 |
Analysis
The day's leverage distribution matters more than the headline figure. The 28.4x ceiling came from Content remediation: strict content-audit completeness FAIL gates + remediation_inventory.py (prioritized, 0/289 pass strict bar) + duplicate MCQ option-text co...; the 17.1x floor was Executed + verified cloud question backfill: 9095 schema-faithful MCQs across 42 content packages, node coverage 77.8%->99.98% (10 degenerate skipped), 0 dup op.... Tasks at the top of the distribution share a shape: tightly-scoped specifications, clear success criteria, and minimal integration ambiguity. The AI doesn't need to discover anything new; it executes against an explicit target.
Tasks at the bottom run differently. They're either bounded by review-heavy work where every step gets verified, or they involve ambiguity that demands several rounds of trial and adjustment. The factor is real and informative, not a failure mode.
The supervisory leverage figure (240.0x today) tracks something orthogonal to wall-clock leverage. It's the ratio of human-equivalent output to human prompt-writing time. It stays high even on lower-leverage days because supervisory minutes scale with task count, not with the human-hour estimate; a 20-minute task and a 4-hour task can both be specified in two minutes of human prompt-writing.
Across the 2 tasks, the day produced roughly 0.9 weeks of senior-engineer-equivalent throughput in 1.5 hours of model wall-clock. That ratio is the practical answer to the question of how much output a single operator can move per day when the model handles the execution and the operator handles the direction.