H
Howardismvol. 03 · quiet corner of the web
Plate IIArchitectureHOWARDISM

Jagged Intelligence (Ghosts, Not Animals)

PublishedMay 23, 2026FiledConceptTopicArchitectureTagsLLM ArchitectureAI SafetyMental ModelReading5 minSourceAI-synthesised

"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the loop, treat as tools

Illustration for Jagged Intelligence (Ghosts, Not Animals)

Sources#

Summary#

Andrej Karpathy's mental model for what LLMs are: not animal intelligences shaped by evolution, intrinsic motivation, curiosity, or empowerment, but "ghosts" — jagged, statistical simulation circuits, summoned from internet data and bolted-on RL. "Jaggedness" names the empirical fact that the same model can refactor a 100K-line codebase or find zero-days, yet tell you to walk to a car wash 50m away to wash your car. The framing matters because a correct model of the entity makes you more competent at directing it: you stop expecting human-shaped failure modes and start staying in the loop where the jaggedness bites.

The jaggedness examples#

  • Strawberry letters. The classic "how many R's in strawberry" failure (now patched).
  • The car wash. Current SOTA: "I want to drive to a car wash 50m away to wash my car — should I drive or walk?" → models say walk, missing that the car is the thing being washed. "How is it possible that Opus 4.7 will refactor a 100K-line codebase or find zero-days, yet tell me to walk to the car wash? This is insane."
  • MenuGen email-matching. His agent cross-correlated Stripe and Google funds by email address instead of a persistent user ID — see Vibe Coding vs. Agentic Engineering.

Jaggedness is the symptom; verifiability + what the labs trained on is the proposed cause. Out-of-distribution circuits are where the spikes drop to valleys.

Ghosts, not animals#

We're not building animals, we are summoning ghosts.

The substrate is pre-training (statistics), with RL bolting capability on top, "increasing the disadvantages" of the statistical base. Consequences he draws:

  • Yelling doesn't help. "If you yell at them, they're not going to work better or worse — it doesn't have any impact." No affect, no morale, no intrinsic drive to model.
  • No five-step fix. Karpathy is candid that the framing may lack "real power" — it's mostly a stance of suspicion and ongoing empirical exploration, not a recipe. "It's more just being suspicious of it and figuring out over time."

The honesty is the point: a calibrated, slightly-distrustful model of a ghost beats an anthropomorphic model of an animal.

Why the framing changes how you build#

If models are jagged ghosts, then:

  1. Stay in the loop. "You need to actually be in the loop a little bit and treat them as tools and stay in touch with what they're doing." (The discipline of Vibe Coding vs. Agentic Engineering.)
  2. Don't anthropomorphize the failure surface. Errors won't be where a human's would be; they'll be at distribution edges (car wash, email IDs).
  3. Map your circuits. Figure out whether your task is in-distribution (you fly) or out (you struggle and may need fine-tuning) — the practical move from The Verifiability Thesis.

Does jaggedness shrink over time?#

Karpathy hopes so but is unsure — and locates the cause again in training, not fundamentals: aesthetics/taste/simplicity "probably aren't part of the RL." His nanoGPT-simplification anecdote: models "hate" being asked to make code simpler and "can't do it" — a sign you're outside the RL circuits ("pulling teeth, not light speed"). He sees "nothing fundamental preventing it; the labs just haven't done it yet." So jaggedness is contingent, not essential — but real today.

Connections#

Open Questions#

  • Karpathy concedes the framing may not have "real power." Is "ghost vs. animal" load-bearing, or a useful intuition pump that doesn't change concrete decisions?
  • If taste/aesthetics/simplicity entered the RL mix, would jaggedness in those dimensions smooth out — or are they too unverifiable to reward cleanly (cf. The Verifiability Thesis)?

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 9
  • AI-Driven Formal Proof Search

    LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…

  • Andrej Karpathy

    Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic eng…

  • Claude Character as Product

    Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…

  • Claude Opus 4.7

    GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…

  • Dogfooding as Product Discipline

    Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, G…

  • Model Introspection Feedback

    Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticis…

  • Outsource Your Thinking, Not Your Understanding

    "You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; know…

  • The Verifiability Thesis

    LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…

  • Vibe Coding vs. Agentic Engineering

    Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x a…

Related articles
  • Evals as Product Spec

    Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…

  • Andrej Karpathy

    Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic eng…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • The Verifiability Thesis

    LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peak…