Sources#
Summary#
Faros AI's methodological argument, and the basis for the most consequential conflict in its 2026 report: during rapid AI transformation, perception lags reality, so survey-based engineering research systematically misses the downstream damage that system telemetry catches in near-real-time. Faros draws its findings from engineering systems (task trackers, IDEs, static analysis, CI/CD, version control, incident management) rather than from how developers feel, and uses that distinction to directly contradict Google's DORA 2025 conclusions.
vendor-claimsource — Faros's own platform is the telemetry instrument, so "telemetry beats surveys" is also a sales argument for that platform. The methodological point stands on its own merits, but the conclusion conveniently favors the vendor's product. See Acceleration Whiplash for the full evidence note.
Why perception lags reality#
The mechanism Faros proposes: at the individual level developers genuinely are more productive — task completion is up, code flows faster, the tools feel powerful — so surveys capture real, positive feeling. What surveys cannot capture is what happens downstream: "the review queues quietly backing up, the incidents accumulating in production, the bugs reaching customers." By the time those consequences show up in how people feel, "months have passed and the signal is already stale." Telemetry, drawn from the systems where work actually happens, does not lag. The claim: engineering leaders making consequential decisions about headcount, tooling, and process "need data as close to real time as possible… not how people feel about the work after the fact."
The DORA contradiction#
This is a flagged inter-source contradiction. DORA's 2025 State of AI-Assisted Software Development concluded that AI amplifies existing strengths and weaknesses, and that strong engineering foundations protect against AI's downsides. Faros's telemetry, it claims, "does not support that as a protective factor": high-performing organizations experience the same downstream deterioration as everyone else (see the maturity-independence finding in Acceleration Whiplash).
Weighing the conflict by method and incentive:
- DORA 2025 — survey-based; large, long-running, vendor-neutral-ish (Google/DevOps Research). Strength: breadth and continuity. Weakness, per Faros: perception lag during fast transitions.
- Faros 2026 — telemetry-based; within-company longitudinal comparison (low- vs high-adoption quarters), Spearman ρ at p<0.05. Strength: measures behavior, not feeling, near-real-time. Weakness:
vendor-claim— Faros sells the platform, and "your mature practices won't save you, you need visibility + a context engine" is precisely the conclusion that grows its market.
Neither is a clean win. The honest read: Faros's measurement critique of surveys is sound (lagging perception is real), but its substantive claim that maturity offers zero protection should be held with the vendor incentive in view — it is the conclusion most favorable to selling the instrument. Worth tracking against future DORA editions and any non-vendor telemetry study.
Connections#
- Acceleration Whiplash — the maturity-independence finding rests on this telemetry-over-survey methodology
- Production-Sourced Evaluation — the same "measure from the real system, not a proxy" instinct applied to model evals; telemetry-vs-survey is its engineering-metrics cousin
- Evals as Product Spec — Cat Wu's evals encode the spec; telemetry encodes what actually shipped — both prefer ground-truth signal over self-report
- Verification as the New Bottleneck — Fiona Fung's warning to break PR-cycle-time into funnel chunks rather than read the aggregate is the same "instrument carefully or the signal misleads" discipline
- Compounding Data Moat — owning the telemetry stream is itself a moat; the report is a demonstration of what the data asset enables
- Agentic Coding Work-Composition Shift — the
empiricalcousin: Anthropic's Clio-based 400K-session telemetry reads behavior-not-feeling the same way, but is a research artifact (validated classifiers, controls) rather than avendor-claimlead-gen report — and its session-layer optimism vs Faros's org-layer pessimism is the felt-vs-system split this page names - Conversation-to-Delegation Shift — the third major usage-telemetry study (OpenAI/Codex,
empirical), and it extends this page's argument: as usage becomes delegation, even interaction-count metrics (active users, chats) go stale — track complexity, runtime, concurrency, reuse, output instead - Anthropic Economic Index — the program that resolves this page's dichotomy: it links usage telemetry to survey responses per person (the Cadences report, ~9,700 linked respondents), treating telemetry and survey as complements rather than rivals
- AI Usage Cadences — the AEI's continuous hourly telemetry is this page's "measure the real system finely" principle pushed to time resolution
Open questions#
- Surveys and telemetry measure different things (felt productivity vs. system outcomes); is the "contradiction" partly a category error — both true at their own layer — rather than one being wrong?
- Is there a non-vendor telemetry dataset large enough to adjudicate the maturity-protection question independently of Faros's commercial framing?
Sources#
- AI Engineering Report 2026: The Acceleration Whiplash — "A direct counterpoint to DORA's 2025 findings"; Research Methodology; Report's Purpose
- DORA, 2025 State of AI-Assisted Software Development (cited by Faros): https://dora.dev/research/2025/dora-report/
Cited by 11
- Acceleration Whiplash
Faros 2026: AI floods a human-paced SDLC with output it can't absorb — throughput up (tasks +34%, epics +66%), quality…
- Agentic Coding Work-Composition Shift
Anthropic's 400K-session telemetry, Oct 2025→Apr 2026: as models improved, the share of sessions fixing broken code fel…
- AI Usage Cadences
AEI Cadences report: continuous hourly telemetry reveals AI usage carries the rhythms of daily life — personal use spik…
- Anthropic Economic Index
Anthropic's recurring economic-research program measuring how Claude usage maps to and diffuses through the economy — p…
- Compounding Data Moat
Anthropic's prescription for Scale-stage defensibility: time-locked behavioral fingerprint + domain-encoded edge cases…
- Conversation-to-Delegation Shift
OpenAI's Codex usage study (June 2026): the move from conversational AI ('asking') to agentic AI ('delegated production…
- Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
- Faros AI
Engineering-intelligence platform that aggregates SDLC telemetry (task trackers, IDEs, CI/CD, VCS, incident systems); p…
- AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 45 concepts. Curated entry point; see Home for all domains.
- Open Questions Backlog
_124 pages with open questions, as of 2026-06-19._
- Production-Sourced Evaluation
Building benchmarks from de-identified real production usage rather than synthetic or hand-authored tasks; DRACO's cent…
Related articles
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- AI as Primary Author
Faros 2026: the assistant→author threshold crossed without a deliberate decision, marked by AI-code acceptance rising 2…
- Verification as the New Bottleneck
Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
