Anthropic: Claude Now Authors >80% of Merged Code as Recursive Self-Improvement Metrics Accelerate

Fresh data from Anthropic’s Institute post “When AI builds itself” (surfacing prominently this week) quantifies how quickly Claude is internalizing Anthropic’s own R&D loop. The numbers paint a clear picture of compounding automation in frontier AI development.

Core Metrics (as of May 2026)

>80% of code merged into Anthropic’s production codebase is now authored by Claude — up from low single digits before the February 2025 Claude Code research preview.
Engineer output (lines of code merged per engineer per day) has risen 8x since 2024. Output per engineer had been essentially flat from 2021–2024; the curve steepened sharply once Claude began autonomously writing and editing full files.
Task length horizon (reliable autonomous completion time) is roughly doubling every 4 months:
March 2024 (Claude Opus 3): ~4-minute tasks
March 2025 (Claude Sonnet 3.7): ~1.5-hour tasks
March 2026 (Claude Opus 4.6): ~12-hour tasks
Trend line points to multi-day tasks later in 2026 and week-scale work in 2027.
Experimental speedups on defined research optimization loops reached ~52x (Claude Mythos Preview, April 2026). A skilled human researcher typically needs 4–8 hours to achieve ~4x on the same class of task.

Additional Signals of Velocity

Session success rate on open-ended coding tasks hit 76% in May 2026 (up 50 percentage points in six months).
In one April 2026 API error-reduction sprint, Claude shipped >800 fixes that cut error rates by a factor of 1,000 — work a human engineer estimated would have taken roughly four years.
Code quality has reached parity with senior human engineers (late 2025 it was still slightly behind); Anthropic expects it to pull ahead within the next year.
Internal researcher poll (March 2026) showed a median estimate of ~4x output increase when using the latest Claude preview.

What This Means for AI R&D

These figures come from internal attribution pipelines and controlled experiments, not just benchmarks. SWE-bench and CORE-Bench (research reproduction) have moved from low performance to near-saturation in roughly two years and 15 months, respectively. The pattern is consistent: once models can reliably handle longer-horizon tasks with minimal human scaffolding, the iteration loop tightens dramatically.

Anthropic frames this as early but measurable progress toward recursive self-improvement — where AI systems increasingly design, implement, test, and improve their own successors. Human researchers still set high-level direction and exercise judgment on problem selection and evaluation rubrics, but execution velocity is shifting fast.

Actionable implications for other labs and infra teams:

Coding-agent adoption curves are steepening; labs still on lighter tool use are likely seeing widening productivity gaps.
R&D throughput is becoming more compute-bound than human-bound on well-scoped tasks.
Bottlenecks are migrating upstream (idea generation, evaluation design, long-term research taste) and downstream (review, integration, safety validation).
The same acceleration that delivers 52x experimental speedups also compresses the timeline for capability jumps — relevant for both capability forecasting and alignment work.

The post (co-authored by Marina Favaro and Jack Clark) is explicit that full recursive self-improvement remains uncertain and carries meaningful control risks, but the internal data shows the trend is already material inside one of the leading labs.

Direct link: https://www.anthropic.com/institute/recursive-self-improvement

Data like this is now the clearest public window into how quickly frontier AI organizations are automating their own core work.

Anthropic: Claude Now Authors >80% of Merged Code as Recursive Self-Improvement Metrics Accelerate

Core Metrics (as of May 2026)

Additional Signals of Velocity

What This Means for AI R&D

Google Removes Gemini 3.5 Flash Feature Toggle in Enterprise

Aehr Test Systems Receives Follow-On FOX-XP Wafer-Level Burn-In Order from Major Silicon Photonics Customer for Hyperscale Data Center Optical Interconnects

The Best Claude Prompts for PR Pros

xAI’s Colossus Supercluster Now Operational

Microsoft Marketplace Floods with 275 New Offers (June 26, 2026): AI, Compliance, and Industry Solutions Dominate

Blockchain Is Central To The Coevolution Of AI and Humans

Core Metrics (as of May 2026)

Additional Signals of Velocity

What This Means for AI R&D

Similar Posts