AI at Scale with Nico Martin from Hugging Face | Transformers.js, Tokenizers, On-Device Inference
Impossibile aggiungere al carrello
Rimozione dalla Lista desideri non riuscita.
Non è stato possibile aggiungere il titolo alla Libreria
Non è stato possibile seguire il Podcast
Esecuzione del comando Non seguire più non riuscita
-
Letto da:
-
Di:
A proposito di questo titolo
Can you really run state-of-the-art machine learning models directly in the browser, with no server, no API calls, and full privacy by default?
In this episode, Nico Martin, Open Source Machine Learning Engineer at Hugging Face and Google Developer Expert in AI and Web Technologies, walks through how Transformers.js makes on-device AI a reality. Nico's journey is anything but conventional. He started as a ski and windsurf instructor, taught himself web development on the side, spent years as a freelancer (including five at a bank building e-banking front ends), and recently landed what he calls his dream job at Hugging Face.
We unpack what Hugging Face actually is (the GitHub for machine learning), how Transformers.js brings the Python Transformers API to the browser, and the real engineering challenges of running models on whatever hardware your users happen to have. Nico explains quantization, ONNX as the standard for portable model architectures, the role of tokenizers, how text becomes tensors, and why WebGPU matters for running larger models client-side.
We also dig into the bigger picture: privacy-preserving AI, the difference between open weights and truly open source models, agents and MCP, and what front-end developers should actually learn to stay relevant in an AI-first world.
Key Topics:
- What Hugging Face is and the role of the Hub, Transformers, and Diffusers
- Transformers.js: bringing Python Transformers API to JavaScript and the browser
- The biggest challenge of browser ML: running on unknown client hardware
- Quantization explained (Q4, 4-bit vs 16/32-bit) and how it compresses models
- ONNX and ONNX Runtime Web: the standard for portable model architectures
- Open weights vs open source models and why the distinction matters
- Tokenizers, token IDs, and why each model needs its own tokenizer
- From text to tensors: pre-processing, inference, and post-processing
- Text embeddings explained through a simple animal feature analogy
- WebGPU and what it unlocks for in-browser inference
- Agents, tool calling, MCP, and how context windows get consumed
- Advice for developers who want to break into AI and ML engineering
🔗 FOLLOW NICO
💼 LinkedIn: https://www.linkedin.com/in/nicodotdev/
🐦 X/Twitter: https://twitter.com/nic_o_martin
🦋 Bluesky: https://bsky.app/profile/nico.dev
🐙 GitHub: https://github.com/nico-martin
🌐 Website: https://nico.dev
🎙️ FOLLOW & SUBSCRIBE
📸 Instagram: https://www.instagram.com/senorsatscale/
📸 Instagram: https://www.instagram.com/neciudev
🎙 Podcast URL: https://neciudan.dev/senors-at-scale
📬 Newsletter: https://neciudan.dev/subscribe
💼 LinkedIn: https://www.linkedin.com/in/neciudan
💼 LinkedIn: https://www.linkedin.com/company/senors-scale/
📚 ADDITIONAL RESOURCES
- Transformers.js: https://huggingface.co/docs/transformers.js
- Hugging Face: https://huggingface.co
- ONNX: https://onnx.ai
- ONNX Runtime: https://onnxruntime.ai
- WebGPU: https://www.w3.org/TR/webgpu/
- Utopia for Realists by Rutger Bregman
#MachineLearning #AI #HuggingFace #TransformersJS #WebML #OnDeviceAI #WebGPU #ONNX #JavaScript #Frontend #WebDev #SenorsAtScale #OpenSource
💬 Would you trust on-device AI over cloud-based models for sensitive data? Share your thoughts in the comments!