Four tasks. May 31, 2026 weighted to 14.9x leverage across 90.0 human-equivalent hours in 362 Claude-minutes. Supervisory leverage closed at 450.0x.
2.2 weeks of human-equivalent throughput in 6.0 hours of Claude wall-clock. The 16.0x ceiling came from Accessibility audit remediation; all 71 findings (5 blocker/30 serious/24 moderate/12 minor) fixed across the client apps (web/iOS/Android/desktop) + design-system; 4 waves; per-cl...; the 8.0x floor sat at Amplified pair-density for 14 beta cohort: 60 validated contrastive pairs topping 17 starved goals to 10/goal, cleared P4.1 + manifest/stale side-effects; all 14 beta zero-findings.
Task Log
| # | Task | Human Est. | Claude | Sup. | Factor | Sup. Factor |
|---|---|---|---|---|---|---|
| 1 | Accessibility audit remediation; all 71 findings (5 blocker/30 serious/24 moderate/12 minor) fixed across the client apps (web/iOS/Android/desktop) + design-system; 4 waves; per-client build/test-gated commits | 80.0h | 300m | 8m | 16.0x | 600.0x |
| 2 | Promoted 13 alpha to beta (Tier A+B): reweighted 12, +621 recall questions for one certification domain + 17 pairs for another, all 27 beta pristine; estimates for two more content domains | 5.0h | 25m | 2m | 12.0x | 150.0x |
| 3 | One content domain to pristine+beta: 3,359 questions from 0 (fixed write bug), reweighted, +1 pair, all 28 beta zero-findings, committed+pushed | 3.0h | 22m | 1m | 8.2x | 180.0x |
| 4 | Amplified pair-density for 14 beta cohort: 60 validated contrastive pairs topping 17 starved goals to 10/goal, cleared P4.1 + manifest/stale side-effects; all 14 beta zero-findings | 2.0h | 15m | 1m | 8.0x | 120.0x |
Aggregate Statistics
| Metric | Value |
|---|---|
| Total tasks | 4 |
| Total human-equivalent hours | 90.0 |
| Total Claude minutes | 362 |
| Total supervisory minutes | 12 |
| Total tokens | 810,000 |
| Weighted average leverage factor | 14.9x |
| Weighted average supervisory leverage factor | 450.0x |
| Human-equivalent weeks | 2.2 |
Analysis
The day's leverage distribution matters more than the headline figure. The 16.0x ceiling came from Accessibility audit remediation; all 71 findings (5 blocker/30 serious/24 moderate/12 minor) fixed across the client apps (web/iOS/Android/desktop) + design-sys...; the 8.0x floor was Amplified pair-density for 14 beta cohort: 60 validated contrastive pairs topping 17 starved goals to 10/goal, cleared P4.1 + manifest/stale side-effects; all 1.... Tasks at the top of the distribution share a shape: tightly-scoped specifications, clear success criteria, and minimal integration ambiguity. The AI doesn't need to discover anything new; it executes against an explicit target.
Tasks at the bottom run differently. They're either bounded by review-heavy work where every step gets verified, or they involve ambiguity that demands several rounds of trial and adjustment. The factor is real and informative, not a failure mode.
The supervisory leverage figure (450.0x today) tracks something orthogonal to wall-clock leverage. It's the ratio of human-equivalent output to human prompt-writing time. It stays high even on lower-leverage days because supervisory minutes scale with task count, not with the human-hour estimate; a 20-minute task and a 4-hour task can both be specified in two minutes of human prompt-writing.
Across the 4 tasks, the day produced roughly 2.2 weeks of senior-engineer-equivalent throughput in 6.0 hours of model wall-clock. That ratio is the practical answer to the question of how much output a single operator can move per day when the model handles the execution and the operator handles the direction.