Sakana AI Conductor: 7B RL Orchestrator Hits SOTA on GPQA-Diamond at ICLR 2026

Sakana AI's Conductor, accepted at ICLR 2026, is a 7-billion-parameter RL-trained orchestrator model that achieves state-of-the-art on GPQA-Diamond and LiveCodeBench by learning to route and coordinate other LLMs — with recursive topologies emerging when self-routing is permitted.

1 min read|agenticonsult Intelligence

Sakana AI Conductor: 7B RL Orchestrator Hits SOTA on GPQA-Diamond at ICLR 2026

Sakana AI's Conductor model, accepted at ICLR 2026, is a 7-billion-parameter model trained via reinforcement learning to design multi-agent topologies and prompt-engineer instructions for each worker agent in the system. Rather than solving tasks directly, Conductor routes other large language models to the appropriate tasks and generates their instructions. Results: state-of-the-art performance on GPQA-Diamond (expert-level science QA) and LiveCodeBench, with recursive topologies emerging autonomously when self-routing is allowed. Individual worker gains of ~3% on AIME25 and GPQA-Diamond from routing alone are comparable to a full generational model upgrade.

Why It Matters

Conductor is the strongest argument yet that "the orchestrator should itself be a learned model" rather than hand-engineered scaffolding. A 7B model achieving SOTA through coordination rather than raw capability suggests a new scaling axis — learned orchestration — that is distinct from both pretraining scale and fine-tuning.

This breaking-news item was assembled from the cited primary source with AI assistance. It is intended for rapid situational awareness — refer to the original publication for the definitive statement.

Sakana AI Conductor: 7B RL Orchestrator Hits SOTA on GPQA-Diamond at ICLR 2026

Sakana AI Conductor: 7B RL Orchestrator Hits SOTA on GPQA-Diamond at ICLR 2026

Why It Matters

Live Intel Feed