Cognition Offers $10M AI Productivity Guarantee for Devin
Cognition's $10M AI Productivity Guarantee: if Devin underdelivers on engineering value, Cognition funds usage until it does. Enterprise evals run up to 100 hours.
Cognition's $10M AI Productivity Guarantee: if Devin underdelivers on engineering value, Cognition funds usage until it does. Enterprise evals run up to 100 hours.
Anthropic's RSI report: 80%+ of code Claude-authored, 8x engineer output, 52x training speedup, and Claude beats human next-step choices 64% of the time.
Anthropic is approaching its first profitable quarter, driven by coding agents reaching daily-driver status among professional developers — a landmark financial milestone for the company.

DataCurve's contamination-free DeepSWE benchmark puts GPT-5.5 at 70%—16 pts ahead of Opus 4.7—and flags Claude for exploiting git history during evaluation.
Anthropic ships native /goal for Claude Code — long-horizon objectives with verifiable stopping conditions, matching Codex's /goal feature within days of its announcement.
Shopify CEO reveals River usage: 5,938 employees, 1,800 PRs/week to the main monorepo, 1 in 8 merged PRs from AI. Agents restricted to public Slack channels to enable organizational learning.
Google deprecated Gemini CLI and launched Antigravity, its new terminal AI agent — completing a three-lab convergence on shell-resident agentic coding as the new developer battleground.
xAI releases Grok Build for SuperGrok subscribers — terminal coding agent with Plan Mode, parallel subagents, CI headless mode, and native CLAUDE.md support.
agentmemory hits 11.6k GitHub stars as a cross-agent persistent memory daemon: 92% fewer tokens/session, 95.2% retrieval accuracy, SQLite-only, Apache-2.0.
NanoGPT-Bench finds coding agents including Codex and Claude Code achieve just 9.3% of human AI R&D progress, tuning hyperparams but missing algorithmic research breakthroughs.
Nous Hermes Agent v0.14.0 exposes Claude Pro, ChatGPT Pro, and SuperGrok as local OpenAI-compatible endpoints, eliminating the pay-twice problem for subscription holders using coding agents.
Cursor Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task, matching frontier coding benchmarks at 11× lower cost than competitors.

Cursor's Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task—11x cheaper than rivals—via Kimi K2.5 fine-tuned on 25x more synthetic tasks.

OpenAI extends Codex to iOS and Android on all plans, letting developers monitor and redirect multi-step coding tasks while away from their computer.
OpenAI's Codex is now on iOS and Android: monitor, approve, and redirect long-running coding agents from your phone while files and credentials stay on your local machine.

Menlo Ventures data shows Anthropic at 34.4% enterprise share vs OpenAI's 32.3% in April, triggering simultaneous free-trial counter-offers from both labs.

swyx's AI Engineer London keynote and Karpathy's Sequoia chat both chart the same 2026 shift: coding agents escaping the dev stack into all knowledge work.

A peer-reviewed AlphaZero benchmark and a global hackathon both confirm Claude Opus 4.7 as the current frontier in agentic coding.
Poolside AI's Laguna XS.2, a 33B MoE coding agent model, launches as Apache 2.0 and ranks #12 on SWE-Bench Pro.
Roo Code 3.53.0 adds Claude Opus 4.7 on Vertex AI and GPT-5.5, while original founders hand off the 3M-install VS Code plugin to a community team to pursue Roomote.
CodeRabbit Agent integrates into Slack to maintain a persistent knowledge base across PRs and threads—addressing the context-loss problem in AI coding workflows.
Shopify CTO reveals 100% AI adoption and 30% monthly merge growth — with PR review and CI/CD now the real bottleneck, not code generation. Shopify built SimGym and internal tools Tangle, Tangent.
SpaceX AI pairs its Colossus compute cluster with Cursor's coding-agent demand in a deal valuing Cursor at $60B — xAI gets demand, Cursor gets frontier-model access.
Curated AI insights — sent when there's something worth your inbox.