趋势

Gemma 4 E2B Achieves 255 tok/s In-Browser with WebGPU Kernels

June 18, 2026AI intel briefing

Core summary

One sentence to understand this update

Gemma 4 E2B has demonstrated in-browser inference at 255 tokens/second on an M4 Max, utilizing WebGPU kernels optimized with assistance from Fable 5.

Impact & opportunity

What this could mean

Developers can leverage these optimized WebGPU kernels to deploy high-performance LLM inference directly within web browsers, expanding client-side AI capabilities.

Source

View original