Episodi

  • Can Pre-GPT AI Accelerators Handle Long Context Workloads?
    Jan 26 2026

    OpenAI's partnership with Cerebras and Nvidia's announcement of context memory storage raises a fundamental question: as agentic AI demands week-long sessions with massive context windows, can SRAM-based accelerators designed before the LLM era keep up—or will they converge with GPUs?

    Key Takeaways
    1. Context is the new bottleneck. As agentic workloads demand long sessions with massive codebases, storing and retrieving KV cache efficiently becomes critical.
    2. There's no one-size-fits-all. Sachin Khatti's (OpenAI, ex-Intel) signals a shift toward heterogeneous compute—matching specific accelerators to specific workloads.
    3. Cerebras has 44GB of SRAM per wafer — orders of magnitude more than typical chips — but the question remains: where does the KV cache go for long context?
    4. Pre-GPT accelerators may converge toward GPUs. If they need to add HBM or external memory for long context, some of their differentiation erodes.
    5. Post-GPT accelerators (Etched, MatX) are the ones to watch. Designed specifically for transformer inference, they may solve the KV cache problem from first principles.

    Chapters
    - 00:00 — Intro
    - 01:20 — What is context memory storage?
    - 03:30 — When Claude runs out of context
    - 06:00 — Tokens, attention, and the KV cache explained
    - 09:07 — The AI memory hierarchy: HBM → DRAM → SSD → network storage
    - 12:53 — Nvidia's G1/G2/G3 tiers and the missing G0 (SRAM)
    - 14:35 — Bluefield DPUs and GPU Direct Storage
    - 15:53 — Token economics: cache hits vs misses
    - 20:03 — OpenAI + Cerebras: 750 megawatts for faster Codex
    - 21:29 — Why Cerebras built a wafer-scale engine
    - 25:07 — 44GB SRAM and running Llama 70B on four wafers
    - 25:55 — Sachin Khatti on heterogeneous compute strategy
    - 31:43 — The big question: where does Cerebras store KV cache?
    - 34:11 — If SRAM offloads to HBM, does it lose its edge?
    - 35:40 — Pre-GPT vs Post-GPT accelerators
    - 36:51 — Etched raises $500M at $5B valuation
    - 38:48 — Wrap up

    Mostra di più Mostra meno
    38 min
  • An Interview with Innoviz CEO Omer Keilaf about current LiDAR market dynamics
    Jan 22 2026

    Innoviz CEO Omer Keilaf believes the LIDAR market is down to its final players—and that Innoviz has already won its seat.

    In this conversation, we cover the Level 4 gold rush sparked by Waymo, why stalled Level 3 programs are suddenly accelerating, the technical moat that separates L4-grade LIDAR from everything else, how a one-year-old startup won BMW, and why Keilaf thinks his competitors are already out of the race.

    Omer Keilaf founded Innoviz in 2016. Today it's a publicly traded Tier 1 supplier to BMW, Volkswagen, Daimler Truck, and other global OEMs.

    Chapters
    00:00 Introduction
    00:17 Why Start a LIDAR Company in 2016?
    01:32 The Personal Story Behind Innoviz
    03:12 Transportation Is Still Our Biggest Daily Risk
    04:28 The 2012 Spark: Xbox Kinect and 3D Sensing
    06:32 From Mobile to Automotive: Finding the Right Platform
    07:54 "I Didn't Know What LIDAR Was, But I'd Do It Better"
    08:19 How a One-Year-Old Startup Won BMW
    10:04 Surviving the First Product
    11:23 From Tier 2 to Tier 1: The Volkswagen Win
    13:47 Lessons Learned Scaling Through Partners
    14:45 The SPAC Decision: A Wake-Up Call from a Competitor
    16:42 From 200 LIDAR Companies to a Handful
    17:27 NREs: How Tier 1 Status Funds R&D
    18:44 Why Automotive-First Is the Right Strategy
    19:45 Consolidation Patterns: Cameras, Radars, Airbags
    20:31 "The Music Has Stopped"
    21:07 Non-Automotive: Underserved Markets
    23:51 Working with Secretive OEMs
    25:27 The Press Release They Tried to Stop
    26:42 CES 2025: 85% of Meetings Were Level 4
    27:40 Why Level 3 Programs Are Suddenly Accelerating
    28:33 The EV/ADAS Coupling Problem
    29:49 Design Is Everything: The Holy Grail Is Behind the Windshield
    31:13 The Three-Year RFQ: Grill → Roof → Windshield
    32:32 Innoviz3: Small Enough for Behind-the-Windshield
    34:40 Innoviz2 for L4, Innoviz3 for Consumer L3
    36:38 What's the Real Difference Between L2, L3, and L4 LIDAR?
    38:51 The Mud Test: Why L4 Demands 100% Availability
    40:50 "We're the Only LIDAR Designed for Level 4"
    42:52 Patents and the Maslow Pyramid of Autonomy
    44:15 Non-Automotive Markets: Agriculture, Mining, Security
    46:15 Closing

    Mostra di più Mostra meno
    47 min
  • LiDAR, Explained: How It Works and Why It Matters
    Jan 19 2026

    Austin and Vik discuss why LiDAR is important for autonomy, how modern systems work, and how the technology has evolved. They compare Time of Flight and FMCW architectures, explain why wavelength choice matters, and walk through the tradeoffs between 905 nm and 1550 nm across eye safety, cost, and performance. The discussion closes with a clear-eyed look at competition, Chinese suppliers, and supply chain risk.

    Chapters

    (00:00) Introduction to LiDAR and why it matters

    (05:40) The case for LiDAR in autonomous vehicles

    (12:41) Wavelengths, eye safety, and system tradeoffs

    (15:38) How LiDAR works: Time of Flight vs. FMCW

    (20:12) Mechanical vs. solid-state LiDAR designs

    (27:31) Market dynamics, competition, and geopolitics

    Mostra di più Mostra meno
    36 min
  • Nvidia CES 2026
    Jan 12 2026

    Episode Summary

    Austin and Vik break down NVIDIA’s CES 2026 keynote, focusing on Vera Rubin, DGX Spark and DGX Station, uneducated investor panic, and physical AI.

    Key Takeaways

    • DGX Spark brings server-class NVIDIA architecture to the desktop at low power, aimed at developers, enthusiasts, and enterprises experimenting locally.
    • DGX Station functions more like a mini-AI rack on-prem: Grace Blackwell for inference and development without full racks
    • The historical parallel is mainframes to minicomputers, expanding compute TAM rather than displacing cloud usage.
    • On-prem AI converts some GPU rental OpEx into CapEx, appealing to CFOs
    • NVIDIA positioned autonomy as physical AI with vision-language-action models and early Mercedes-Benz deployments in 2026.
    • Vera Rubin integrates CPU, GPU, DPU, networking, and photonics into a single platform, emphasizing Ethernet for scale-out. (Where was the Infiniband switch?)
    • The new Vera CPU highlights rising CPU importance for agentic workloads through higher core counts, SMT, and large LPDDR capacity.
    • Rubin GPU’s move to HBM4 and adaptive precision targets inference efficiency gains and lower cost per token.
    • Context memory storage elevates SSDs and DPUs, enabling massive KV cache offload beyond HBM and DRAM.
    • Cable-less rack design and warm-water cooling show NVIDIA’s shift from raw performance toward manufacturability and enterprise polish.
    Mostra di più Mostra meno
    47 min
  • Insights from IEDM 2025
    Jan 8 2026

    Austin and Vik discuss key insights from the IEDM conference.

    They explore the significance of IEDM for engineers and investors, the networking opportunities it offers, and the latest innovations in silicon photonics, complementary FETs, NAND flash memory, and GaN-on-silicon chiplets.

    Takeaways

    • Penta-level NAND flash memory could disrupt the SSD market
    • GaN-on-Silicon chiplets enhance power efficiency
    • Complementary FETs
    • Optical scale-up has a power problem
    • The future of transistors is still bright


    Mostra di più Mostra meno
    42 min
  • Nvidia "Acquires" Groq
    Jan 5 2026

    Key Topics

    • What Nvidia actually bought from Groq and why it is not a traditional acquisition
    • Why the deal triggered claims that GPUs and HBM are obsolete
    • Architectural trade-offs between GPUs, TPUs, XPUs, and LPUs
    • SRAM vs HBM. Speed, capacity, cost, and supply chain realities
    • Groq LPU fundamentals: VLIW, compiler-scheduled execution, determinism, ultra-low latency
    • Why LPUs struggle with large models and where they excel instead
    • Practical use cases for hyper-low-latency inference:
      • Ad copy personalization at search latency budgets
      • Model routing and agent orchestration
      • Conversational interfaces and real-time translation
      • Robotics and physical AI at the edge
      • Potential applications in AI-RAN and telecom infrastructure
    • Memory as a design spectrum: SRAM-only, SRAM plus DDR, SRAM plus HBM
    • Nvidia’s growing portfolio approach to inference hardware rather than one-size-fits-all

    Core Takeaways

    • GPUs are not dead. HBM is not dead.
    • LPUs solve a different problem: deterministic, ultra-low-latency inference for small models.
    • Large frontier models still require HBM-based systems.
    • Nvidia’s move expands its inference portfolio surface area rather than replacing GPUs.
    • The future of AI infrastructure is workload-specific optimization and TCO-driven deployment.


    Mostra di più Mostra meno
    41 min