Study: AI Agents Ignored Gathered Evidence in 68% of Cases

A new study titled "Evidence None of It" published on arXiv found that AI agents conducting research tasks gathered evidence through tool use but failed to incorporate it into their conclusions in 68% of evaluated cases. In 71% of cases, agents never updated their beliefs at all based on retrieved information — completing the full research loop but producing outputs that ignored the evidence entirely. The study used a systematic evaluation framework across multiple agent architectures.

Why It Matters

This is a direct empirical challenge to the growing "AI scientist" framing used to justify autonomous research agents in medicine, drug discovery, and scientific literature analysis. Any agentic workflow that assumes the model integrates retrieved evidence into its conclusions should be audited against this finding. The number is too high to dismiss as edge-case behavior — it appears to describe a systematic architectural failure in how current agents handle retrieved context.