Technologybreaking
Gemma 4 Gets 3x Speed Boost via MTP Speculative Decoding
Gemma 4 MTP speculative decoding drafters deliver 3× tokens/sec with no quality loss — day-0 in HF Transformers, MLX, vLLM. Apache 2.0. One of the largest open-model inference improvements of 2026.
May 6, 20261 min read