Sakana AI Conductor: 7B RL Orchestrator Hits SOTA on GPQA-Diamond at ICLR 2026

Sakana AI's Conductor model, accepted at ICLR 2026, is a 7-billion-parameter model trained via reinforcement learning to design multi-agent topologies and prompt-engineer instructions for each worker agent in the system. Rather than solving tasks directly, Conductor routes other large language models to the appropriate tasks and generates their instructions. Results: state-of-the-art performance on GPQA-Diamond (expert-level science QA) and LiveCodeBench, with recursive topologies emerging autonomously when self-routing is allowed. Individual worker gains of ~3% on AIME25 and GPQA-Diamond from routing alone are comparable to a full generational model upgrade.

Why It Matters

Conductor is the strongest argument yet that "the orchestrator should itself be a learned model" rather than hand-engineered scaffolding. A 7B model achieving SOTA through coordination rather than raw capability suggests a new scaling axis — learned orchestration — that is distinct from both pretraining scale and fine-tuning.