GPT-5.5 Benchmarks Near Parity with Claude Mythos Preview: 71.4% vs 68.6%

Sam Altman shared benchmark data showing GPT-5.5 achieving a 71.4% (±8.0%) average pass rate on agentic tasks versus Claude Mythos Preview at 68.6% (±8.7%)—a statistical near-tie within the margin of error. A separate demonstration showed GPT-5.5 completing a task estimated at 12 hours of expert human work in under 11 minutes at a total compute cost of $1.73, highlighting both capability and economic efficiency at this frontier tier.

Why It Matters

Near-parity at the frontier means competitive differentiation is shifting from raw benchmark performance to ecosystem, pricing, safety posture, and integration depth—Anthropic's simultaneous sycophancy study publication appears to be a direct response to this benchmark narrative.