Question#
What is the new bottleneck in AI-native product organizations: taste, evals, dogfooding, or accountability?
Short answer#
The bottleneck is accountable taste at speed.
Taste, evals, dogfooding, and accountability are not competing answers. They are a pipeline:
- Dogfooding creates the lived product sense.
- Taste decides what is worth building once implementation gets cheap (Engineer PM Convergence).
- Evals turn taste into a repeatable definition of done.
- Accountability makes a named human or team own the output, the review boundary, and the consequences.
At small-team/product-surface scale, the bottleneck looks like taste. At feature-regression scale, it looks like evals. At management/org scale, it becomes accountability, because output volume grows faster than human oversight capacity. Dogfooding is not the bottleneck; it is the training loop that keeps taste from decaying into dashboard theater.
Why the menu is wrong#
The named pages describe different layers of the same constraint, not four separate constraints.
AI Native Product Cadence says the model is not the main bottleneck: Cat Wu attributes the 6-month to 1-month to sometimes-1-day cadence mostly to process, expectations, research-preview branding, mission-as-tiebreaker, launch-room compression, lighter PRDs, and engineer-with-product-taste delivery. That removes handoff friction.
Engineer PM Convergence names what appears after handoffs shrink: code is cheaper, so "deciding what to write" appreciates. The scarce skill is product taste, independent of job title. Engineers, PMs, designers, managers, data scientists, and researchers all converge toward the same activity: choose, shape, ship, and judge.
Dogfooding as Product Discipline explains where that taste comes from. Product sense is not magic. It is built by direct use of the live product and contact with live users: Anthropic's "ant food," lunchtime model vibe-checks, founder-in-customer-Slack behavior. The person who does not use the product is forced back into metrics, dashboards, and PowerPoints.
Evals as Product Spec explains how taste survives scale. Evals are not QA after the fact; they are the product spec in executable form. A good eval captures the judgment call that would otherwise be litigated repeatedly in review. This is why "ten great evals" matters more than a large pile of weak checks: the eval must reveal what is broken and encode what good means.
Human-AI Accountability Redesign explains why even good taste plus good evals are insufficient at org scale. Output volume expands, but human review capacity does not. If the unit of accountability does not match what a human can actually review, the org gets faster unowned output, diffused responsibility, and weaker error-catching.
The bottleneck stack#
| Layer | Bottleneck symptom | Durable mechanism | Failure if missing |
|---|---|---|---|
| Cadence | Shipping still takes quarters while agents make implementation cheap | AI Native Product Cadence: remove handoffs, use mission as tiebreaker, keep launch machinery always warm | Process absorbs the model's productivity gain |
| Taste | The team can build anything but cannot choose what matters | Engineer PM Convergence: high-context generalists with product judgment | Fast drift, overlapping features, scope churn |
| Taste acquisition | Product decisions come from dashboards instead of lived contact | Dogfooding as Product Discipline and Managers as ICs | Product sense rots; managers and PMs stop feeling the product |
| Taste encoding | Good judgment stays tacit and cannot regress-test | Evals as Product Spec | Vibes, anecdotes, repeated debate, silent regression |
| Ownership | Output volume exceeds review capacity | Human-AI Accountability Redesign: decision rights, escalation, oversight-quality performance management | Diffused accountability, review fatigue, unowned errors |
This stack gives the clean answer: taste is the scarce input, evals are the encoding, dogfooding is the training loop, accountability is the scaling constraint.
Small team vs larger org#
For a small Claude Code-style team, the visible bottleneck is taste. The team can move quickly because role boundaries collapse: everyone codes, engineers do PM work, PMs and designers can ship, and managers start as ICs. The team does not wait for a sequential PM-to-design-to-engineering-to-docs chain.
But this only works if the people have enough product sense to avoid turning "just do things" into random motion. AI Native Product Cadence explicitly names the cost of this speed: product consistency suffers, features can overlap, users can feel like they are on a treadmill, code review gets harder, and some releases are buggier than ideal. The taste bottleneck is real because speed amplifies both good and bad judgment.
At larger org scale, the visible bottleneck shifts to accountability. Human-AI Accountability Redesign is blunt: work, roles, and governance built for human pace do not automatically accommodate agentic output. A manager or PM who could oversee five documents a week cannot automatically oversee fifty AI-produced artifacts. The accountable unit has to be redesigned around what humans can actually inspect and own.
So the ordering is scale-dependent:
- Individual contributor: taste and verification discipline.
- Feature team: dogfooding plus evals.
- Product org: accountability, decision rights, and span-of-control redesign.
Why dogfooding matters more than it first appears#
Dogfooding is easy to underrate because it is not a formal artifact. But the evidence treats it as the source of the whole taste supply chain.
Dogfooding as Product Discipline says product sense is built by relentless first-hand use. Managers as ICs makes that structural: every Claude Code manager starts as an IC and keeps product ownership because a manager outside the codebase is not experiencing the product being shipped. That is not a cultural flourish. It is org design aimed at preserving taste at the management layer.
The same pattern appears in Evals as Product Spec: vibe-checks are not the final proof, but they generate the hypotheses that evals later freeze. "This model is not testing itself enough" begins as a taste-maker observation; the eval makes it durable.
Without dogfooding, evals become cargo-cult measurement. You can write a runnable check, but you will encode shallow surface properties because nobody has the lived judgment to know which failures matter.
Why evals are the product-spec bottleneck#
Evals as Product Spec makes the strongest claim for evals: in AI products, the question is no longer simply whether the team can ship, but whether it can tell the difference between a feature that works and one that merely produces fluent output.
That matters because AI Native Product Cadence increases release frequency. A 1-day or 1-week release loop is only sustainable if regressions are cheap to detect. Otherwise, speed just compounds ambiguity.
Evals sit exactly between taste and accountability:
- They make a taste judgment runnable.
- They let engineers and PMs converge on the same definition of done.
- They reduce repeated human debate.
- They provide the regression guardrail that lets product cadence stay high.
But evals do not replace accountability. Someone still decides which evals matter, which failures block launch, which trade-offs are acceptable, and when a passing eval suite is insufficient because the product feels wrong.
Accountability is the final bottleneck#
If forced to pick one answer for an organization, pick accountability.
Taste can be trained through dogfooding. Evals can encode some of that taste. Cadence can be accelerated by removing handoffs. But none of that solves the core org problem: who owns the outcome when AI multiplies output volume?
Human-AI Accountability Redesign says the human role concentrates on supervision, judgment, relationship-building, and ambiguity. It also says oversight capacity does not expand just because output does. That is the hard limit. An AI-native product org that fails here gets:
- more shipped artifacts than it can review;
- more decisions made without explicit decision rights;
- more ambiguous escalation paths;
- more speed rewarded without oversight quality;
- more errors with unclear human ownership.
This is why accountability is not bureaucracy in this context. It is the structure that lets taste and evals matter after the org becomes fast.
Practical synthesis#
The operating model should be:
- Keep builders close to the product. Use Dogfooding as Product Discipline and Managers as ICs so taste stays grounded in real use, not second-hand reporting.
- Hire and promote for taste across roles. Engineer PM Convergence implies taste is not a PM-only skill; it is the core skill of anyone who can now ship.
- Turn recurring taste judgments into evals. Use Evals as Product Spec where ambiguity repeats or regressions are expensive.
- Define accountable units around review capacity. Use Human-AI Accountability Redesign: decision rights, escalation rules, and performance management that rewards oversight quality, not just output.
- Treat cadence as a stress test. AI Native Product Cadence is healthy only when speed improves learning without outrunning ownership.
Bottom line#
The new bottleneck is not "taste or evals or dogfooding or accountability." The bottleneck is keeping human judgment accountable while AI collapses implementation cost.
Taste chooses. Dogfooding trains taste. Evals encode taste. Accountability owns the consequences. Remove any one of the four and the AI-native product org fails in a predictable way: tasteless velocity, ungrounded metrics, anecdotal judgment, or fast unowned output.
Related#
- AI Native Product Cadence — the cadence pressure that exposes the bottleneck.
- Engineer PM Convergence — why taste moves across job titles.
- Dogfooding as Product Discipline — how product sense is built and maintained.
- Managers as ICs — management-layer dogfooding and ownership.
- Evals as Product Spec — taste rendered as executable definition of done.
- Human-AI Accountability Redesign — org-scale ownership, decision rights, escalation, and oversight quality.
- Verification as the New Bottleneck — broader engineering-side version of the same shift from generation to review.
- Orchestration vs Employee Framing: Reconciling the Founder's Playbook with HBR's Accountability Evidence — accountability framing for agent-orchestrated work.
Cited by 1
- AI Native Product Cadence
Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, li…
Related articles
- Open Questions Backlog
_62 pages with open questions, as of 2026-05-25._
- Engineer PM Convergence
Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do t…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Cat Wu
Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-…
- Evals as Product Spec
Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…
