Sources#
Summary#
The longitudinal finding of Anthropic's 400K-session study: over just seven months (Oct 2025 → Apr 2026), what people do with Claude Code shifted measurably away from firefighting and toward end-to-end agentic work — and the work got more valuable. The clearest single move: the share of sessions spent fixing broken code fell from 33% to 19% (the debugging share nearly halved). In its place grew the work around code — operating software 14%→21%, and writing + data analysis roughly doubling, ~10%→~20%. Meanwhile the estimated economic value of the average session rose ~25–27%. This is harness shrinkage seen from the usage side: as the model got more capable, users spent less time un-breaking things and more time delegating whole tasks.
Evidence note.
empirical, same first-party caveat as Returns to Expertise in Agentic Coding (Anthropic measuring its own product via Clio + Sonnet-4.6 classifiers, validated against telemetry; excludes headless/SDK/IDE usage). Task value is a deliberately relative proxy — see below.
The nine work modes#
Every session is classified into the single mode that best describes it. The taxonomy itself is worth recording:
- Code-direct (≈56%): building new (25%), fixing broken (26%), testing/orchestrating (5%)
- Operating (17%): deploying, configuring, running pipelines, monitoring
- Plan/explore (14%): understanding an existing system, planning a change
- Analysis/prose (13%): analyzing data, communicating via docs/presentations
The classifier agrees with automatic telemetry — >90% of sessions it labeled as creating/modifying code showed actual code changes — which is the validation anchor for trusting the rest of the transcript-based measures.
The shift, quantified (Oct 2025 → Apr 2026)#
| Work mode | Direction | Change |
|---|---|---|
| Fixing broken code | ↓ | 33% → 19% |
| Operating software | ↑ | 14% → 21% |
| Writing + data analysis | ↑ | ~10% → ~20% |
And value rose across the board (freelance-marketplace proxy): average session +~27%; building +43%, operating +34%, fixing +32%. The report stresses the dollar figures are coarse and best read as relative movement, not literal value.
The interpretation: less time is going to the reactive, low-leverage work (debugging) and more to proactive, higher-leverage, more autonomous work (operate-the-system, analyze-the-data, write-the-document). The report explicitly frames Claude Code usage as a possible preview of where knowledge work is headed as agents embed in non-coding work — the doubling of writing/analysis is the leading edge of that spread beyond code.
Telemetry, not survey — and the contrast it sets up#
Like Faros, this study reads behavior from system signals (here, session transcripts + automatic telemetry via Clio) rather than from how developers feel. But the two telemetry studies reach opposite-feeling conclusions, and the contrast is instructive:
- This study: session value up ~27%, debugging down, success rates up — an optimistic picture at the level of the individual interactive session.
- Faros: throughput up but quality down (bugs/dev +54%, incidents/PR +243%, review time 5×) — a pessimistic picture at the level of the organization's SDLC.
These are not a direct contradiction; they measure different units and stages. Anthropic measures did this session accomplish its goal and how valuable was it (a within-session, interactive-usage proxy, with the human in the loop); Faros measures what happens downstream across the whole org's pipeline weeks later (incidents, review queues, reopened tickets). One can be true at the session layer while the other is true at the org layer — "the task got done and felt valuable" and "the cumulative downstream cost is rising" are the felt-productivity-vs-system-outcome split that Telemetry vs. Survey Measurement itself flags. Evidence weight also differs: this is empirical research telemetry; Faros is vendor-claim lead-gen telemetry.
Connections#
- Harness Shrinkage as Models Improve — the debugging-share collapse and the move to end-to-end delegation is the shrinking harness read from usage data: as the model improves, less scaffolding/firefighting per task
- Acceleration Whiplash — the juxtaposed telemetry: value/success up at the session layer here vs. quality down at the org layer there; different units, both potentially true
- Telemetry vs. Survey Measurement — both studies measure behavior not feeling; this is the
empiricalcousin to Faros'svendor-claimtelemetry, and the session-vs-org distinction is the same felt-vs-system split - Returns to Expertise in Agentic Coding — the companion finding from the same study: who succeeds, alongside this what shifts
- AI as Primary Author — more end-to-end agentic use (operate/analyze/write) is the usage-side of authorship moving toward the agent
- Task Time-Horizon Scaling — the rising capability ceiling is the upstream cause: longer reliable task horizons are what let usage move from fixing to operating/analyzing whole workflows
- Claude Code — the product whose usage composition this tracks
- Conversation-to-Delegation Shift — the OpenAI/Codex, cross-population telemetry of the same asking→doing move; this page is its Anthropic/Claude-Code, single-tool sibling — two labs, two tools, one shift
Open questions#
- The window is seven months and the value proxy is coarse/relative. How much of the +27% is genuine task-complexity growth vs. classifier/marketplace-matching drift?
- The study excludes headless/SDK/IDE usage — a "substantial share," and likely the most automated/end-to-end. Does including it accelerate or reverse the composition shift?
- If "fixing" keeps falling, is that because models break less, or because broken-code work is migrating to non-interactive pipelines this study doesn't see?
Sources#
- Agentic coding and persistent returns to expertise — §"What people use Claude Code for", §"The work"
Cited by 11
- Acceleration Whiplash
Faros 2026: AI floods a human-paced SDLC with output it can't absorb — throughput up (tasks +34%, epics +66%), quality…
- AI as Primary Author
Faros 2026: the assistant→author threshold crossed without a deliberate decision, marked by AI-code acceptance rising 2…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Anthropic Economic Index
Anthropic's recurring economic-research program measuring how Claude usage maps to and diffuses through the economy — p…
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Conversation-to-Delegation Shift
OpenAI's Codex usage study (June 2026): the move from conversational AI ('asking') to agentic AI ('delegated production…
- AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 45 concepts. Curated entry point; see Home for all domains.
- Open Questions Backlog
_124 pages with open questions, as of 2026-06-19._
- Returns to Expertise in Agentic Coding
Anthropic's 400K-session study: domain expertise (not coding skill) is what amplifies an agent — experts get 2× the act…
- Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
- Telemetry vs. Survey Measurement
Faros 2026: perception lags reality, so survey-based engineering research (DORA) misses downstream AI damage that syste…
Related articles
- Planning / Execution Division of Labor
Anthropic's 400K-session telemetry: in a typical Claude Code session humans make ~70% of planning decisions (what to do…
- Verification as the New Bottleneck
Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…
- Conversation-to-Delegation Shift
OpenAI's Codex usage study (June 2026): the move from conversational AI ('asking') to agentic AI ('delegated production…
- Vibe Coding vs. Agentic Engineering
Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…
- Anthropic Economic Index
Anthropic's recurring economic-research program measuring how Claude usage maps to and diffuses through the economy — p…
