Diversity Collapse Paper Formalizes Multi-Agent LLM Failure Mode

New research formally measures what many practitioners have suspected: multi-agent LLM systems converge on near-identical outputs over time, regardless of starting prompts or model architecture. The cause is structural coupling — shared context, shared task definitions, shared feedback mechanisms. The paper measures this homogenization using the Vendi score and finds it occurs across diverse architectures and configurations.

Why It Matters

This directly challenges the premise behind most brainstorming, ideation, and multi-perspective multi-agent setups: they do not get diversity for free by default. For teams building swarm orchestration or multi-agent reasoning systems, the practical fix requires explicitly engineered isolation between agents — decoupled evaluation, heterogeneous starting conditions, and separated feedback loops. Without these, you're paying for N agents and getting the output of one.