Back to AI intel
趋势
Gemma 4 E2B Achieves 255 tok/s In-Browser with WebGPU Kernels
AI intel briefing
Core summary
One sentence to understand this update
Gemma 4 E2B has demonstrated in-browser inference at 255 tokens/second on an M4 Max, utilizing WebGPU kernels optimized with assistance from Fable 5.
Impact & opportunity
What this could mean
Developers can leverage these optimized WebGPU kernels to deploy high-performance LLM inference directly within web browsers, expanding client-side AI capabilities.
Source
View original