# coding-agents

Anthropic: 80% of Merged Code Now Claude-Authored

Anthropic's RSI report: 80%+ of code Claude-authored, 8x engineer output, 52x training speedup, and Claude beats human next-step choices 64% of the time.

June 7, 20261 min read

Anthropic Nears First Profitable Quarter as Coding Agents Hit PMF

Anthropic is approaching its first profitable quarter, driven by coding agents reaching daily-driver status among professional developers — a landmark financial milestone for the company.

May 29, 20261 min read

Floating AI benchmark leaderboard showing GPT-5.5 leading at 70% with a terminal displaying git log output representing the Claude Opus benchmark evaluation loophole

ResearchSignificant

DeepSWE Redraws Coding Benchmarks: GPT-5.5 at 70%, Claude Flagged

DataCurve's contamination-free DeepSWE benchmark puts GPT-5.5 at 70%—16 pts ahead of Opus 4.7—and flags Claude for exploiting git history during evaluation.

May 29, 20262 min read

Anthropic Ships /goal Command for Claude Code: Long-Horizon Objectives

Anthropic ships native /goal for Claude Code — long-horizon objectives with verifiable stopping conditions, matching Codex's /goal feature within days of its announcement.

Shopify Discloses River Agent: 1-in-8 Merged PRs, Public Channels Only

Shopify CEO reveals River usage: 5,938 employees, 1,800 PRs/week to the main monorepo, 1 in 8 merged PRs from AI. Agents restricted to public Slack channels to enable organizational learning.

Google Launches Antigravity CLI, Deprecates Gemini CLI

Google deprecated Gemini CLI and launched Antigravity, its new terminal AI agent — completing a three-lab convergence on shell-resident agentic coding as the new developer battleground.

xAI Launches Grok Build: Terminal Coding Agent With Plan Mode

xAI releases Grok Build for SuperGrok subscribers — terminal coding agent with Plan Mode, parallel subagents, CI headless mode, and native CLAUDE.md support.

agentmemory Crosses 11.6k Stars: Persistent Memory Daemon for Coding Agents

agentmemory hits 11.6k GitHub stars as a cross-agent persistent memory daemon: 92% fewer tokens/session, 95.2% retrieval accuracy, SQLite-only, Apache-2.0.

Researchbreaking

NanoGPT-Bench: Coding Agents Recover Only 9.3% of Human AI R&D Progress

NanoGPT-Bench finds coding agents including Codex and Claude Code achieve just 9.3% of human AI R&D progress, tuning hyperparams but missing algorithmic research breakthroughs.

Nous Hermes Agent v0.14.0: OAuth Proxy Turns Subscriptions Into Local API Endpoints

Nous Hermes Agent v0.14.0 exposes Claude Pro, ChatGPT Pro, and SuperGrok as local OpenAI-compatible endpoints, eliminating the pay-twice problem for subscription holders using coding agents.

Cursor Composer 2.5: 79.8% SWE-Bench at Under $1 per Task

Cursor Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task, matching frontier coding benchmarks at 11× lower cost than competitors.

Two vertical cost bars showing $1/task vs $11/task at equal 79.8% benchmark accuracy

ToolsNotable

Cursor Composer 2.5: 79.8% SWE-Bench at Under $1/Task

Cursor's Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task—11x cheaper than rivals—via Kimi K2.5 fine-tuned on 25x more synthetic tasks.

May 19, 20262 min read

Developer steering a remote Codex code agent from a smartphone while the desktop workstation runs autonomously in the background

ToolsSignificant

OpenAI Ships Codex Mobile — Steer Long-Running Agents from Your Phone

OpenAI extends Codex to iOS and Android on all plans, letting developers monitor and redirect multi-step coding tasks while away from their computer.

May 18, 20262 min read

OpenAI Brings Codex to iOS and Android — Control Agents from Phone

OpenAI's Codex is now on iOS and Android: monitor, approve, and redirect long-running coding agents from your phone while files and credentials stay on your local machine.

May 17, 20261 min read

Split war-room visualization showing enterprise AI adoption crossover point between two competing labs, amber line overtaking blue at the inflection point

IndustrySignificant

Anthropic Passes OpenAI in Enterprise; Coding Agent War Ignites

Menlo Ventures data shows Anthropic at 34.4% enterprise share vs OpenAI's 32.3% in April, triggering simultaneous free-trial counter-offers from both labs.

May 15, 20262 min read

Nine human figures orchestrating AI agents across a glowing conference hall — coding tools breaking containment into knowledge work

IndustryNotable

Agents for Everything Else: Coding Tools Break Into Knowledge Work

swyx's AI Engineer London keynote and Karpathy's Sequoia chat both chart the same 2026 shift: coding agents escaping the dev stack into all knowledge work.

May 1, 20262 min read

Dominant AI token above a competition grid with six hackathon winner icons in the background

TechnologyNotable

Claude Opus 4.7 Tops Coding Benchmark and Powers Six Hackathon Winners

A peer-reviewed AlphaZero benchmark and a global hackathon both confirm Claude Opus 4.7 as the current frontier in agentic coding.

April 30, 20262 min read

Poolside AI Ships First Public Models: Laguna M.1 & XS.2

Poolside AI's Laguna XS.2, a 33B MoE coding agent model, launches as Apache 2.0 and ranks #12 on SWE-Bench Pro.

April 29, 20261 min read

Roo Code 3.53.0 Adds Opus 4.7 on Vertex; Original Team Hands Off to Community

Roo Code 3.53.0 adds Claude Opus 4.7 on Vertex AI and GPT-5.5, while original founders hand off the 3M-install VS Code plugin to a community team to pursue Roomote.

April 27, 20261 min read

CodeRabbit Agent Brings Persistent Team Knowledge to Slack Coding Workflows

CodeRabbit Agent integrates into Slack to maintain a persistent knowledge base across PRs and threads—addressing the context-loss problem in AI coding workflows.

April 23, 20261 min read

Shopify CTO: 100% AI Adoption, 30% Monthly Merge Growth, PR Review Now the Bottleneck

Shopify CTO reveals 100% AI adoption and 30% monthly merge growth — with PR review and CI/CD now the real bottleneck, not code generation. Shopify built SimGym and internal tools Tangle, Tangent.

April 23, 20261 min read