H
Howardism
Howardism · Vol. 03Plate II · No. 02

Product & Org, in order.

Notes6DomainProduct & OrgOpen Qs17Newest23 May 2026Oldest6 May 2026

Product cadence, org design, and the AI-native team.

Map of Content for the product-org domain — 6 concepts. Curated entry point; see Home for all domains.

  • AI Native Product Cadence (hub) — Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, lighter PRDs, weekly metrics readouts
  • Dogfooding as Product Discipline — Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, Glasgow founder-led sales)
  • Engineer PM Convergence — Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do things" cultural substrate
  • Evals as Product Spec — Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done looks like for ambiguous AI features; companion to introspection (hypothesis) and vibe-check (direction)
  • Managers as ICs — Every Claude Code manager starts as an IC; flat org; agentic coding collapsed the onboarding cost that pushed managers out of the codebase
  • Model Introspection Feedback — Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticism; caveats around model self-report fidelity

Open questions 17 open

  • AI Native Product Cadence
    • Does the cadence scale beyond ~100 people? Anthropic itself is bigger (~30-40 PMs alone), but the [[claude-code]] team that visibly drives cadence is small.
    • What's the equivalent of research-preview branding for B2B enterprise launches where customers expect stability? Cat doesn't address.
    • How much of the cadence is structural (process choices) vs cultural (talent density)? Probably both, ratio unclear.
  • Dogfooding as Product Discipline
    • Dogfooding works when the team *is* the user (Claude Code) or near it (Cat Wu, Boris). How do you build product sense for users very unlike you — does "talk to customers" fully substitute, as Glasgow/Fung's small-business work suggests?
    • Can dogfooding scale, or does it implicitly cap how large an AI-native product org can stay taste-driven before it reverts to dashboards?
  • Engineer PM Convergence
    • Does this scale beyond ~50-person Claude Code-style teams? Boris hedges: "I think this is going to be a question for years."
    • What happens to formal PM career ladders in companies where engineers do PM work? Open at Anthropic per Cat.
    • Cross-disciplinary generalist is a hiring bar — where does the supply come from? Career changers, or new-grad bias toward AI-native education?
  • Evals as Product Spec
    • How do you write an eval for *taste*-driven features like [[claude-character-as-product|character]]? Amanda's role is canonical for being eval-resistant; Cat names her as someone who *is* good at evals here, but doesn't describe the technique.
    • The 10-vs-100 number is given without justification. Is there a Goldilocks zone, or does it depend on feature surface area? [[client-side-agent-optimization]]'s framing of combos suggests evals also have a combinatorial explosion problem.
    • How do evals interact with [[harness-shrinkage-as-models-improve]]? When a harness asset shrinks because the model now handles it natively, the evals built around the old harness may become artifacts rather than guardrails. Does Anthropic retire evals or repurpose them?
    • Is there a single non-Anthropic example of a PM-as-eval-writer to cite, or is this currently a Cat-Wu-singular framing? The Matt Pocock workshop reaches the same place from a different vocabulary, but no third source has been ingested yet.
  • Managers as ICs
    • Fung's own open question: "Do you still need separate iOS and Android orgs?" — if engineers flex across platforms via Claude, the traditional platform-split org may dissolve too. How far does flattening go?
    • Does manager-as-IC scale past a certain org size, or only work while Claude Code is small and the codebase is Claude-legible?
  • Model Introspection Feedback
    • How reliable are 4.7-class introspective reports? Anthropic's interpretability research suggests partial fidelity but not full. Empirically, Cat reports it's good enough to drive harness fixes — but unclear at what model scale this technique becomes load-bearing.
    • Does adversarial introspection ("why did you fail?") yield different signal than neutral ("walk me through your reasoning")? Worth probing.
    • Could a meta-agent run introspection automatically against logged failures? Sounds tractable but no public implementation.