H
Howardism
Plate IIGovernance & Workforce中文HOWARDISM

Research Taste as the Human Bottleneck

PublishedJune 7, 2026FiledConceptDomainGovernance & WorkforceTagsGovernanceHuman AI CollaborationResearchRole EvolutionAnthropicReading6 minSourceAI-synthesised

The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an approach is a dead end; the top rung of the autonomy ladder, and the open question of whether taste is 'just another capability' AI fails at then masters

Illustration for Research Taste as the Human Bottleneck

Sources#

Summary#

As AI absorbs the execution of AI development (AI Accelerating AI Development), the human role contracts toward a residue the essay When AI builds itself calls research taste and judgment: "choosing which problems matter, which results to trust, and when an approach is a dead end." This is the top rung of the autonomy ladder and the single capability that, if it stays human, keeps recursive self-improvement from closing the loop. Whether it stays human is the essay's load-bearing uncertainty.

The narrowing role#

The essay's clearest statement of the dynamic: "The human role is narrowing at each step in the AI development process." Two concrete narrowings:

  • Writing → reviewing. "Once human- and AI-authored code quality reach parity, humans will stop writing code entirely, and shift to only reviewing it." (Parity is reported as roughly now; see AI Accelerating AI Development.) This is the harness-shrinkage story told from the human side.
  • Running → choosing experiments. "Once Claude can run experiments, the question shifts towards 'Which of these experiments is worth running?'" The doing — writing the code, running the experiment, producing the result — "now costs almost nothing in human time, even if it still has costs in compute."

The residual human comparative advantage, "for now," is taste: deciding what's worth the compute. Put in the terms of Compute Allocator, the human becomes a compute allocator at the level of an entire research program rather than a single invocation.

Why it might not stay human — "just another capability"#

The essay refuses to treat research taste as a permanent human moat. Two arguments push against it:

  1. Perspiration is automatable, and that's most of the work. Genius is "1% inspiration and 99% perspiration"; the 99% — scale it up, see what breaks, fix it, retry — is exactly what Claude excels at. Large-scale research progress "is mostly a function of tools and resources." (See The Bitter Lesson, Recursive Self-Improvement.)
  2. Taste shows the same capability curve as everything else. Early evidence of improving research judgment — Claude beating the human next-step choice 51%→64% on hard detour moments (AI Accelerating AI Development) — suggests "research taste might be just another AI capability that AI systems fail at for a time, then get good at." The precedent: AI was once bad at, then good at, explaining why a joke is funny, demonstrating theory of mind, and solving linguistic riddles — qualitative skills assumed to be human-shaped. This is the optimistic face of Jagged Intelligence (Ghosts, Not Animals): today's valley is tomorrow's peak.

The honest counter, in an Anthropic employee's words: "The comparative advantage of humans as of right now is still in seeing the bigger picture and thinking beyond the confines of the immediate task." The question is whether "for right now" is a stable equilibrium or a receding frontier.

The bottleneck either way#

Even if taste stays human, it becomes the binding constraint — the Amdahl's-law bottleneck of the whole pipeline. If humans can't review code as fast as Claude generates it, "human review will become the bottleneck to AI development." And if humans spend most of their time on the single-digit fraction of work that is direction-setting, each human steers vastly more work — so the quality and throughput of human judgment becomes the scarce resource, exactly as Verification as the New Bottleneck predicts for verification and the HBR accountability-redesign work predicts for oversight. The risk is that the human nominally decides but actually rubber-stamps (the "not close to substituting for senior researchers" judgment quietly eroding into a formality).

The human cost (the quieter thread)#

The essay includes unusually candid employee quotes about what the narrowing feels like — worth recording because they name a cost the productivity charts don't:

  • The collapse of the gift economy of small favors: asking a colleague "can you help me get this script running?" "created a little debt, a little mutual awareness. [Claude is] faster, it creates zero debt, but each of these is a lost bid for human collaboration."
  • The vertigo of contingent relevance: "On days where everything works well, I can't help but think nothing I do matters … But then there are days where everything breaks and I don't understand why and I realize I have no idea what I've been up to anymore."

Connections#

Open questions#

  • Is research taste a genuine ceiling (an architectural capability scaling can't reach) or the next jagged valley to fill? The essay calls this the decisive unknown.
  • If taste is automatable, what — if anything — remains a durable human comparative advantage in AI development?
  • How do you measure rubber-stamping? "Humans set direction" can be true on paper while real judgment quietly transfers to the model.

Sources#

  • When AI builds itself — §"What might the future of work at Anthropic look like?" and §"What if we're wrong?"
§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 11
  • AI Accelerating AI Development

    The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…

  • AI R&D Autonomy Evaluation (AECI)

    How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…

  • Autonomous Scientific Discovery

    Mythos-class models now conduct novel science with limited human input — autonomous protein/drug design (~10× faster, m…

  • Compute Allocator

    The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Jagged Intelligence (Ghosts, Not Animals)

    "Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the l…

  • Governance & Workforce

    Map of Content for the governance-workforce domain — 11 concepts. Curated entry point; see Home for all domains.

  • Open Questions Backlog

    _96 pages with open questions, as of 2026-06-14._

  • Recursive Self-Improvement

    An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

  • The Bitter Lesson

    Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…

  • Verification as the New Bottleneck

    Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax;…

Related articles
  • Recursive Self-Improvement

    An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…

  • Task Time-Horizon Scaling

    METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…

  • AI Accelerating AI Development

    The empirical core of *When AI builds itself*: measured evidence AI already speeds AI R&D at Anthropic — >80% of merged…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • AI R&D Autonomy Evaluation (AECI)

    How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…