資料來源#
摘要#
Fiona Fung 從帶領 Claude Code + Cowork 工程團隊得出的核心論點:多年來,工程頻寬一直是昂貴的資源——規劃、審查與各種流程的存在,都是為了保護它。一旦 agentic coding 讓寫程式變得廉價,瓶頸就移到了驗證、審查與維護。「在 Claude Code 團隊裡,寫程式真的不再是慢的那一環了。」新的稀缺資源是對改動正確性的信心——而且正因為頻寬(連帶吞吐量)爆炸式成長,它變得更加稀缺。
為什麼驗證如今成了約束條件#
三股力量匯聚在一起:
- 數量。 頻寬增加得如此之多,以至於「我們得付出更多注意力去確認:它正確嗎。」
- 角色界線模糊。 更多人(設計師、經理、PM)現在都會 check in 改動,因此每個人都需要對自己改動的正確性有信心。
- 維護成本。 吞吐量更高意味著要維護的東西更多——維護成本變成一等公民的考量,而不再是事後才想到的事。
這是 Karpathy 的 The Verifiability Thesis(「LLM 自動化你能驗證的事」)在組織層級上的對照,也是 Harness Shrinkage as Models Improve 的需求面(prompt 鷹架縮小;機械式驗證仍是承重結構)。
TDD 失去它的稅負#
這個轉變有一個鮮明的徵兆:TDD 過去感覺像「吃花椰菜」——先寫會失敗的測試、確認它失敗、然後修好。有了 Claude,Fung 發現它「有趣與愉快得多……它把測試驅動開發的稅負拿掉了。」經濟學翻轉了:當寫測試幾乎是免費的,那個讓驗證有所依託的紀律(一個能被證明先失敗、再通過的測試)就純粹是上檔利益。(參照 tdd / red-green-refactor 紀律;先寫失敗測試這一步就是驗證器。)
Shift left#
她反覆出現的口頭禪:shift left——透過自動化在更靠近源頭處攔下問題,而不是等到客戶踩到了才處理。「有什麼比我先撞上 bug 更好?就是有自動化機制能在更靠近源頭處攔住它。」隨著吞吐量上升,驗證能跟上的唯一辦法,就是讓它自動化而且提早,而不是手動而且滯後。
誰來審查——以及 human-in-the-loop 的界線#
在推出 Claude Code 自己的 code-review 功能之前,「你們怎麼跟得上 code review?」是她最常被問到的問題。答案是:Claude Code review 處理風格、lint、明顯的 bug,以及 spec drift(如果你把 spec check 進 codebase,「Claude 非常擅長對照 spec drift 進行驗證」)。但在重要的地方,人類仍留在迴圈中:法務審查、風險容忍度、信任邊界——「信任但要驗證,並在人類能帶來必要專業之處交給人類。」分工是這樣的:把機械式驗證自動化,把人類判斷保留給風險與信任邊界的決定。(參照 Deep Modules for Agents:在全新 context 中的審查者。)
衡量這個轉變(以及一個陷阱)#
她關注的訊號:onboarding 上手時間 ↓、PR 週期時間 ↓、Claude 協助的 commit ↑(「我已經好幾個月沒看到不是 Claude 協助的 commit 了」)。這個陷阱:不要只看端到端的 PR 週期時間——要把它拆成漏斗區塊。如果週期時間沒在下降,原因未必是 AI 採用率低;也可能是 **CI/建置系統在新吞吐量下卡住了。**而且吞吐量本身不是目標——「找個辦法去衡量你真正想解決的東西」,而不只是速度。
相關連結#
- Fiona Fung — 這個論點的作者
- The Verifiability Thesis — Karpathy 的「自動化你能驗證的事」是模型層級的成因;這是組織層級的後果
- Harness Shrinkage as Models Improve — 它所印證的綜論:鷹架縮小,機械式驗證不會
- Evals as Product Spec — Cat Wu 的 evals 是把驗證編碼成產品 spec;PM 面的對照篇
- Code as Source of Truth — 把 spec check 進 repo,正是讓 Claude 能驗證 spec drift 的關鍵
- Building Is Cheap, Arguing Is Expensive — 上游的另一半:生成是廉價的,因此驗證(與判斷)才是成本集中之處
- Claude Code Auto Mode — 自動核准分類器是權限層級上的驗證自動化
- Deep Modules for Agents — 在全新 context 中的審查者,是 code-review 層級上提升驗證品質的招式
- AI Brain Fry — 驗證若停留在手動的風險:隨著數量成長,監督疲勞會讓錯誤增加
- AI-Driven Formal Proof Search — 極端情況:以編譯器作為驗證器,於是瓶頸被完全機械化
- Recursive Self-Improvement — 組織版的 Amdahl's law:隨著生成加速,人類的 code review 成了 Anthropic 新的瓶頸——這個論點放大到 AI 打造 AI 的尺度
- AI Accelerating AI Development — 佐證的數據:一個自動化的 Claude 審查者,本可在 merge 前攔下過往生產事故背後約 1/3 的 bug
- Research Taste as the Human Bottleneck — 當人類審查/判斷的速度跟不上 Claude 生成的速度時,判斷就成了約束性條件——這是驗證更高海拔的形式
衍生內容#
- When Does Verification Quality Determine Whether AI Automation Works? — 把這個瓶頸推廣成一道驗證品質的階梯:Lean/形式化證明、軟體 CI、漏洞重現,以及充滿雜訊的判斷任務
待解決的問題#
- Fung 自己的開放問題:「完全自動化的審查要推進到多遠?」——速度/安全的平衡點在哪裡,又該如何在不重新引入審查瓶頸的前提下,讓人類維持信心?
- 如果 CI/建置才是隱藏的卡點,那麼驗證基礎設施(test runner、CI 容量)是否會成為一家 AI-native org 真正的資本支出?
資料來源#
Cited by 26
- Agentic Honesty & Diligence
As models get more capable, failing to surface decision-relevant information shifts from a capability failure to an ali…
- AI Accelerating AI Development
The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…
- AI Brain Fry
Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognit…
- AI-Driven Formal Proof Search
LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…
- AI-Native Product Org Bottlenecks
AI-native product-org bottleneck is accountable taste at speed: dogfooding trains taste, evals encode it, and accountab…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Building Is Cheap, Arguing Is Expensive
"In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs
- Cat Wu
Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-…
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Claude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
- Code as Source of Truth
Docs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification
- Deep Modules for Agents
Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in…
- Dogfooding as Product Discipline
Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, G…
- Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
- Fiona Fung
Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- Human-in-the-Loop Boundaries
Humans belong at allocation, understanding, design-concept, risk, and accountability boundaries; they slow the system d…
- AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 36 concepts. Curated entry point; see Home for all domains.
- Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
- The PRD-Replacement Spectrum at AI-Native Speed
Four positions (grill-then-PRD → lighter-PRD → build-to-decide → prototype-is-spec) are one spectrum once you decompose…
- Product Velocity as Moat
Shipping speed as differentiator + trust signal ("you'll scale with us"); a treadmill that must convert into durable lo…
- Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
- Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
- The Verifiability Thesis
LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…
- When Does Verification Quality Determine Whether AI Automation Works?
Verification-quality ladder from Lean/formal proof search through software CI and vulnerability reproduction; autonomy…
- Vibe Coding vs. Agentic Engineering
Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…
Related articles
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Fiona Fung
Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no…
- Agent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
- Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
