Skill-RAG Triggers Retrieval Only When LLM Is About to Fail

Skill-RAG replaces the monolithic retrieve-always RAG pipeline with a failure-detection architecture: a probe trained on LLM hidden states predicts whether the model is about to produce an incorrect answer. Retrieval only fires when the probe detects imminent failure, and different failure modes (factual gaps, multi-hop reasoning failures, temporal knowledge gaps) are routed to different specialized retrieval skills. The result beats both standard RAG and full-context retrieval on HotpotQA, Natural Questions, and TriviaQA across efficiency and accuracy.

Why It Matters

Skill-RAG reframes the core RAG design question from "what is our retriever architecture?" to "when should we retrieve, and which retrieval skill is appropriate?" This shift — from monolithic pipeline to composable primitives selected by failure mode — is directly applicable to any production agentic system where retrieval cost and latency matter.