Ep. 7 - The Execution Tax of Karpathy’s $20 GPT-2 copertina

Ep. 7 - The Execution Tax of Karpathy’s $20 GPT-2

Ep. 7 - The Execution Tax of Karpathy’s $20 GPT-2

Ascolta gratuitamente

Vedi i dettagli del titolo

A proposito di questo titolo

Andrej Karpathy just showed that GPT-2 can now be trained in under three hours for roughly $20 — reframing a once-“dangerous” model as the new MNIST. On paper, fp8 promises 2× FLOPS. In practice, it delivers something far messier: overhead, precision tradeoffs, and marginal gains that only appear after careful tuning.

In this episode of Execution Over Everything, we pressure-test what Karpathy is actually working on beneath the headline. We unpack why theoretical speedups don’t translate cleanly to wall-clock wins, how fp8 shifts cost and failure modes rather than eliminating them, and what breaks once you embed these techniques into real, repeated training workflows.

This isn’t about celebrating faster demos. It’s about understanding the execution tax — the hidden costs in retries, numerics, and operational complexity that show up only when systems run continuously in the real world.


  • Andrej Karpathy

  • fp8 training

  • GPT-2 training

  • mixed precision training

  • H100 GPUs

  • GPU optimization

  • AI training cost

  • model training speed

  • FLOPS vs throughput

  • wall-clock performance

  • training overhead

  • loss curves

  • bf16 vs fp8

  • scaling laws

  • AI infrastructure

  • execution bottlenecks

  • retries and failure modes

  • production ML systems

  • execution tax

  • Execution Over Everything

Ancora nessuna recensione