LangChain DeepAgents Harness Profiles: 10–20pt Benchmark Jump

LangChain released Harness Profiles for its DeepAgents framework — per-provider and per-model overrides for base system prompts, tool names, middleware, and behavior constraints. Internal testing showed 10–20 point improvements on tau2-bench over default configurations. Out-of-box profiles ship for OpenAI, Anthropic, and Google model families. The harness is now a first-class versioned object that can be diffed, versioned, and swapped independently of model selection.

Why It Matters

Formalizing the harness as a benchmarkable versioned object will reshape how AI agent performance is published: "benchmarking a model without specifying the harness is like benchmarking a chip without specifying the compiler" — a framing that will influence how evaluations are reported across the industry.