DeepSeek V4-Pro Open-Sourced with 10x KV Cache Reduction
DeepSeek has open-sourced V4-Pro (1.6T total / 49B active parameters, 1M token context) and V4-Flash (284B total / 13B active), with open weights and a technical report on HuggingFace. The model hit #1 trending on HuggingFace in just 43 minutes — the fastest ever — with 500+ likes in the first 28 minutes. The defining architectural advance is a 10x KV cache reduction vs DeepSeek V3.2: at 1M context on GB300 NVLink 72 hardware, V3.2 required 35.60GB KV cache allowing only 4 concurrent requests; V4-Pro's 10x reduction multiplies that to approximately 40 concurrent requests on the same hardware. V4-Pro exceeds Claude Opus 4.6 on Terminal Bench. The API was updated the same day; models are also accessible via chat.deepseek.com Expert Mode / Instant Mode.
Why It Matters
The 10x KV cache reduction directly multiplies inference concurrency at long contexts — this is not a benchmark story but an infrastructure economics story. Combined with open weights and Huawei Ascend 950 hardware confirmation, DeepSeek V4 positions as the credible open-source alternative to frontier proprietary models for high-concurrency deployments.