Mercor APEX-Agents Benchmark Gets Hugging Face Leaderboard for Open-Source Models
Mercor's APEX-Agents benchmark—a frontier evaluation designed to test whether AI models can perform the real work of consultants, lawyers, and bankers—now has an official Hugging Face leaderboard for open-source models. The dataset is publicly available, enabling any team to evaluate open-weight models against professional knowledge-work tasks and compare results on a standardized leaderboard.
Why It Matters
APEX-Agents fills a benchmark gap: most agent evals focus on coding or math, while professional knowledge work (legal analysis, financial modeling, consulting strategy) has lacked a standardized open evaluation. The HF leaderboard makes it easy to track which open models are closing the gap on these enterprise-relevant tasks.