Plate IIAI Engineering機器翻譯 · machine-translatedENHOWARDISM

驗證成為新的瓶頸

PublishedMay 23, 2026FiledConceptDomainAI EngineeringTagsAgent EngineeringAI Coding WorkflowAI Native OrgReading5 minSourceAI-synthesised

Fiona Fung：寫程式不再是瓶頸——驗證、審查、維護才是；shift-left；TDD 失去它的稅負；PR 週期時間的漏斗分析

資料來源#

Running an AI-native engineering org

摘要#

Fiona Fung 從帶領 Claude Code + Cowork 工程團隊得出的核心論點：多年來，工程頻寬一直是昂貴的資源——規劃、審查與各種流程的存在，都是為了保護它。一旦 agentic coding 讓寫程式變得廉價，瓶頸就移到了驗證、審查與維護。「在 Claude Code 團隊裡，寫程式真的不再是慢的那一環了。」新的稀缺資源是對改動正確性的信心——而且正因為頻寬（連帶吞吐量）爆炸式成長，它變得更加稀缺。

為什麼驗證如今成了約束條件#

三股力量匯聚在一起：

數量。 頻寬增加得如此之多，以至於「我們得付出更多注意力去確認：它正確嗎。」
角色界線模糊。 更多人（設計師、經理、PM）現在都會 check in 改動，因此每個人都需要對自己改動的正確性有信心。
維護成本。 吞吐量更高意味著要維護的東西更多——維護成本變成一等公民的考量，而不再是事後才想到的事。

這是 Karpathy 的 The Verifiability Thesis（「LLM 自動化你能驗證的事」）在組織層級上的對照，也是 Harness Shrinkage as Models Improve 的需求面（prompt 鷹架縮小；機械式驗證仍是承重結構）。

TDD 失去它的稅負#

這個轉變有一個鮮明的徵兆：TDD 過去感覺像「吃花椰菜」——先寫會失敗的測試、確認它失敗、然後修好。有了 Claude，Fung 發現它「有趣與愉快得多……它把測試驅動開發的稅負拿掉了。」經濟學翻轉了：當寫測試幾乎是免費的，那個讓驗證有所依託的紀律（一個能被證明先失敗、再通過的測試）就純粹是上檔利益。（參照 tdd / red-green-refactor 紀律；先寫失敗測試這一步就是驗證器。）

Shift left#

她反覆出現的口頭禪：shift left——透過自動化在更靠近源頭處攔下問題，而不是等到客戶踩到了才處理。「有什麼比我先撞上 bug 更好？就是有自動化機制能在更靠近源頭處攔住它。」隨著吞吐量上升，驗證能跟上的唯一辦法，就是讓它自動化而且提早，而不是手動而且滯後。

誰來審查——以及 human-in-the-loop 的界線#

在推出 Claude Code 自己的 code-review 功能之前，「你們怎麼跟得上 code review？」是她最常被問到的問題。答案是：Claude Code review 處理風格、lint、明顯的 bug，以及 spec drift（如果你把 spec check 進 codebase，「Claude 非常擅長對照 spec drift 進行驗證」）。但在重要的地方，人類仍留在迴圈中：法務審查、風險容忍度、信任邊界——「信任但要驗證，並在人類能帶來必要專業之處交給人類。」分工是這樣的：把機械式驗證自動化，把人類判斷保留給風險與信任邊界的決定。（參照 Deep Modules for Agents：在全新 context 中的審查者。）

衡量這個轉變（以及一個陷阱）#

她關注的訊號：onboarding 上手時間 ↓、PR 週期時間 ↓、Claude 協助的 commit ↑（「我已經好幾個月沒看到不是 Claude 協助的 commit 了」）。這個陷阱：不要只看端到端的 PR 週期時間——要把它拆成漏斗區塊。如果週期時間沒在下降，原因未必是 AI 採用率低；也可能是 **CI／建置系統在新吞吐量下卡住了。**而且吞吐量本身不是目標——「找個辦法去衡量你真正想解決的東西」，而不只是速度。

衍生內容#

When Does Verification Quality Determine Whether AI Automation Works? — 把這個瓶頸推廣成一道驗證品質的階梯：Lean／形式化證明、軟體 CI、漏洞重現，以及充滿雜訊的判斷任務

待解決的問題#

Fung 自己的開放問題：「完全自動化的審查要推進到多遠？」——速度／安全的平衡點在哪裡，又該如何在不重新引入審查瓶頸的前提下，讓人類維持信心？
如果 CI／建置才是隱藏的卡點，那麼驗證基礎設施（test runner、CI 容量）是否會成為一家 AI-native org 真正的資本支出？

資料來源#

Running an AI-native engineering org

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 26

Agentic Honesty & Diligence
As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI Brain Fry
Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognit…
AI-Driven Formal Proof Search
LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…
AI-Native Product Org Bottlenecks
AI-native product-org bottleneck is accountable taste at speed: dogfooding trains taste, evals encode it, and accountab…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Building Is Cheap, Arguing Is Expensive
"In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs
Cat Wu
Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Claude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
Code as Source of Truth
Docs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification
Deep Modules for Agents
Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…
Dogfooding as Product Discipline
Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, G…
Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
Fiona Fung
Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Human-in-the-Loop Boundaries
Humans belong at allocation, understanding, design-concept, risk, and accountability boundaries; they slow the system d…
AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 36 concepts. Curated entry point; see Home for all domains.
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
The PRD-Replacement Spectrum at AI-Native Speed
Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…
Product Velocity as Moat
Shipping speed as differentiator + trust signal ("you'll scale with us"); a treadmill that must convert into durable lo…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
The Verifiability Thesis
LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…
When Does Verification Quality Determine Whether AI Automation Works?
Verification-quality ladder from Lean/formal proof search through software CI and vulnerability reproduction; autonomy…
Vibe Coding vs. Agentic Engineering
Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Fiona Fung
Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…
Agent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…

Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Fiona Fung
Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…
Agent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…

Cited by 26

Agentic Honesty & Diligence
As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…
AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
AI Brain Fry
Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognit…
AI-Driven Formal Proof Search
LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…
AI-Native Product Org Bottlenecks
AI-native product-org bottleneck is accountable taste at speed: dogfooding trains taste, evals encode it, and accountab…
Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
Building Is Cheap, Arguing Is Expensive
"In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs
Cat Wu
Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-…
Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
Claude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
Code as Source of Truth
Docs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification
Deep Modules for Agents
Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…
Dogfooding as Product Discipline
Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, G…
Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
Fiona Fung
Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…
Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
Human-in-the-Loop Boundaries
Humans belong at allocation, understanding, design-concept, risk, and accountability boundaries; they slow the system d…
AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 36 concepts. Curated entry point; see Home for all domains.
Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
The PRD-Replacement Spectrum at AI-Native Speed
Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…
Product Velocity as Moat
Shipping speed as differentiator + trust signal ("you'll scale with us"); a treadmill that must convert into durable lo…
Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
The Verifiability Thesis
LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…
When Does Verification Quality Determine Whether AI Automation Works?
Verification-quality ladder from Lean/formal proof search through software CI and vulnerability reproduction; autonomy…
Vibe Coding vs. Agentic Engineering
Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…

驗證成為新的瓶頸

資料來源#

摘要#

為什麼驗證如今成了約束條件#

TDD 失去它的稅負#

Shift left#

誰來審查——以及 human-in-the-loop 的界線#

衡量這個轉變（以及一個陷阱）#

相關連結#

衍生內容#

待解決的問題#

資料來源#