Claude Mythos Hits 3-Hour Autonomous Task Horizon
In late May, Claude Mythos — Anthropic's invite-only model above Opus 4.8 — achieved an 80% success rate on tasks requiring up to 3 hours and 6 minutes of solo autonomous work on METR's benchmark. Superforecasters surveyed when the baseline was 1.5 hours predicted this milestone would arrive by end of 2026; it arrived months early. The METR metric measures human-equivalent time to complete, not model runtime.
Why It Matters
The doubling-every-four-months pace of METR task horizons is now producing results ahead of expert predictions — compressing the timeline before long-horizon agentic autonomy becomes a standard capability across frontier models.