DeepSeek V4-Pro Open-Sourced with 10x KV Cache Reduction

DeepSeek V4-Pro (1.6T total / 49B active params, 1M context) and V4-Flash (284B total / 13B active params) have been open-sourced, reaching #1 trending on HuggingFace in 43 minutes — the fastest model ever to do so — with a 10x KV cache reduction over V3.2.

1 min read|agenticonsult Intelligence

DeepSeek V4-Pro Open-Sourced with 10x KV Cache Reduction

DeepSeek has open-sourced V4-Pro (1.6T total / 49B active parameters, 1M token context) and V4-Flash (284B total / 13B active), with open weights and a technical report on HuggingFace. The model hit #1 trending on HuggingFace in just 43 minutes — the fastest ever — with 500+ likes in the first 28 minutes. The defining architectural advance is a 10x KV cache reduction vs DeepSeek V3.2: at 1M context on GB300 NVLink 72 hardware, V3.2 required 35.60GB KV cache allowing only 4 concurrent requests; V4-Pro's 10x reduction multiplies that to approximately 40 concurrent requests on the same hardware. V4-Pro exceeds Claude Opus 4.6 on Terminal Bench. The API was updated the same day; models are also accessible via chat.deepseek.com Expert Mode / Instant Mode.

Why It Matters

The 10x KV cache reduction directly multiplies inference concurrency at long contexts — this is not a benchmark story but an infrastructure economics story. Combined with open weights and Huawei Ascend 950 hardware confirmation, DeepSeek V4 positions as the credible open-source alternative to frontier proprietary models for high-concurrency deployments.

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.

DeepSeek V4-Pro Open-Sourced with 10x KV Cache Reduction

DeepSeek V4-Pro Open-Sourced with 10x KV Cache Reduction

Why It Matters

Live Intel Feed