Turing's Open MM-RL Hits #1 Trending on HuggingFace with PhD-Level STEM Benchmark

Turing released Open MM-RL, a multimodal STEM benchmark targeting PhD-level difficulty across Physics, Chemistry, Biology, and Mathematics. Every answer is 100% deterministically verifiable—no vibes-based grading—and each prompt was double-vetted by PhD domain specialists. The dataset supports single-image, multi-panel, and multi-image task formats for complexity scaling. It trended #1 on HuggingFace upon release, with 3,000 additional out-of-the-shelf tasks announced as coming soon.

Why It Matters

PhD-level multimodal STEM with verifiable ground truth closes a major gap in frontier model evaluation. As models approach human-expert performance on existing benchmarks, deterministic PhD-level evaluation becomes essential for detecting genuine capability regression or improvement.