Sub Quadratic Launches subQ — Big Architecture Bet, Thin Evidence
Sub Quadratic dropped its "subQ" model with a striking headline: a 12-million-token context window powered by sparse attention, running 52× more compute-efficient than FlashAttention at 1M tokens, at under 5% of Anthropic Opus pricing. The architecture thesis is technically legitimate — sparse attention pre-selects semantically relevant tokens anywhere in a context rather than scoping to a local window, avoiding the quadratic compute blowup that caps dense attention. If the claims bear out, it would matter for both cloud inference costs and local model viability.
What the Source Actually Says
Tim Carambat (AnythingLLM) published a technical audit on launch day. His central finding: every published benchmark tests the 1M-preview model, not the headline 12M model. The 12M model had no public benchmarks and no early access — Carambat applied and expected to receive only the 1M-preview.
On SWEBench Verified, the 1M-preview scored 81.8 against frontier-level competition, but Opus 4.7 scored higher. On MRCRv2 long-context retrieval at 1M tokens, Carambat spotted a direct inconsistency: the video shows 62%, the company's website shows 65.9% for the same test. The video also omitted Opus 4.6 and GPT-5.5 comparison rows visible on the website table — presenting the benchmark more favorably by leaving out higher-scoring rivals.
No technical report accompanied the launch. The "98% accuracy" framing amplified on social media is not traceable to any published benchmark artifact. Carambat's broader framing is cautious optimism: DeepSeek v4 shipped hybrid attention the prior week with similar long-context efficiency goals, confirming the direction is real even if subQ's specific claims remain unverified.
Strategic Take
Sparse attention is a credible long-context efficiency path — the convergence with DeepSeek v4's hybrid approach signals a genuine industry trend, not a one-off claim. But subQ's launch mixes real architectural ambition with unverified headline numbers and measurable benchmark inconsistencies. Hold off on roadmap commitments until an independent evaluation of the actual 12M model is available.


