1 articles

#inference-speed

Gemma 4 Gets 3x Speed Boost via MTP Speculative Decoding

Gemma 4 MTP speculative decoding drafters deliver 3× tokens/sec with no quality loss — day-0 in HF Transformers, MLX, vLLM. Apache 2.0. One of the largest open-model inference improvements of 2026.

May 6, 20261 min read

AI Intelligence Newsletter

Curated AI insights — sent when there's something worth your inbox.