Ep. 7 - The Execution Tax of Karpathy’s $20 GPT-2
Impossibile aggiungere al carrello
Rimozione dalla Lista desideri non riuscita.
Non è stato possibile aggiungere il titolo alla Libreria
Non è stato possibile seguire il Podcast
Esecuzione del comando Non seguire più non riuscita
-
Letto da:
-
Di:
A proposito di questo titolo
Andrej Karpathy just showed that GPT-2 can now be trained in under three hours for roughly $20 — reframing a once-“dangerous” model as the new MNIST. On paper, fp8 promises 2× FLOPS. In practice, it delivers something far messier: overhead, precision tradeoffs, and marginal gains that only appear after careful tuning.
In this episode of Execution Over Everything, we pressure-test what Karpathy is actually working on beneath the headline. We unpack why theoretical speedups don’t translate cleanly to wall-clock wins, how fp8 shifts cost and failure modes rather than eliminating them, and what breaks once you embed these techniques into real, repeated training workflows.
This isn’t about celebrating faster demos. It’s about understanding the execution tax — the hidden costs in retries, numerics, and operational complexity that show up only when systems run continuously in the real world.
Andrej Karpathy
fp8 training
GPT-2 training
mixed precision training
H100 GPUs
GPU optimization
AI training cost
model training speed
FLOPS vs throughput
wall-clock performance
training overhead
loss curves
bf16 vs fp8
scaling laws
AI infrastructure
execution bottlenecks
retries and failure modes
production ML systems
execution tax
Execution Over Everything