Three tasks. May 13, 2026 weighted to 54.5x leverage across 80.0 human-equivalent hours in 88 Claude-minutes. A quieter day: an observability-platform from design-to-implementation gap closure, a deterministic diagram-edge audit pass, and a single flagship-course buildout with curriculum mapping, study plan, and interaction tagging. Supervisory leverage closed at 480.0x.
2.0 weeks of human-equivalent throughput in 1.5 hours of Claude wall-clock. The 130.0x ceiling came from an observability platform: closed design-vs-implementation gap — 14 models + migration 0012, RBAC + API keys + audit, 30+ REST routes, 12 Celery workers, in-process MCP mount, 3...; the 15.0x floor sat at an AP course: CED mapping + 10-day study plan + V2 atom interaction tagger + goal_id bug fix + repair tooling + 354 atoms tagged with 708 interactions.
Task Log
| # | Task | Human Est. | Claude | Sup. | Factor | Sup. Factor |
|---|---|---|---|---|---|---|
| 1 | an observability platform: closed design-vs-implementation gap — 14 models + migration 0012, RBAC + API keys + audit, 30+ REST routes, 12 Celery workers, in-process MCP mount, 3 ingest protocols (Prom remote_write/StatsD/syslog), 6 new frontend pages, real LLM wiring (a mid-tier model RCA + an embedding model embedd... | 65.0h | 30m | 3m | 130.0x | 1300.0x |
| 2 | Deterministic diagram edge audit: Python classifier, 6 .mmd fixes, 12 per-edge exceptions, audit doc update | 5.0h | 18m | 2m | 16.7x | 150.0x |
| 3 | an AP course: CED mapping + 10-day study plan + V2 atom interaction tagger + goal_id bug fix + repair tooling + 354 atoms tagged with 708 interactions | 10.0h | 40m | 5m | 15.0x | 120.0x |
Aggregate Statistics
| Metric | Value |
|---|---|
| Total tasks | 3 |
| Total human-equivalent hours | 80.0 |
| Total Claude minutes | 88 |
| Total supervisory minutes | 10 |
| Total tokens | 490,000 |
| Weighted average leverage factor | 54.5x |
| Weighted average supervisory leverage factor | 480.0x |
| Human-equivalent weeks | 2.0 |
Analysis
The day's leverage distribution matters more than the headline figure. The 130.0x ceiling came from an observability platform: closed design-vs-implementation gap — 14 models + migration 0012, RBAC + API keys + audit, 30+ REST routes, 12...; the 15.0x floor was an AP course: CED mapping + 10-day study plan + V2 atom interaction tagger + goal_id bug fix + repair tooling + 354 atoms tagged with 708.... Tasks at the top of the distribution share a shape: tightly-scoped specifications, clear success criteria, and minimal integration ambiguity. The AI doesn't need to discover anything new; it executes against an explicit target.
Tasks at the bottom run differently. They're either bounded by review-heavy work where every step gets verified, or they involve ambiguity that demands several rounds of trial and adjustment. The factor is real and informative, not a failure mode.
The supervisory leverage figure (480.0x today) tracks something orthogonal to wall-clock leverage. It's the ratio of human-equivalent output to human prompt-writing time. It stays high even on lower-leverage days because supervisory minutes scale with task count, not with the human-hour estimate; a 20-minute task and a 4-hour task can both be specified in two minutes of human prompt-writing.
May 13 was a low-task-count day but with one large, high-leverage build (the observability platform). When a single agent gets handed a coherent implementation spec covering 14 models, ~30 routes, RBAC, audit logging, and Celery workers, the ratio of human prompt-writing to AI output reaches its highest reasonable bound. Days like this produce big numbers from small task counts.
Across the 3 tasks, the day produced roughly 2.0 weeks of senior-engineer-equivalent throughput in 1.5 hours of model wall-clock. That ratio is the practical answer to the question of how much output a single operator can move per day when the model handles the execution and the operator handles the direction.