Sources#
Summary#
With Mythos 5 (the bio-safeguards-lifted form of Fable 5), Anthropic reports the first Claude results in which a model conducts novel scientific research largely on its own — choosing experimental moves, running domain tools, recovering from failures, and producing findings that match or beat skilled humans and recent published baselines. This is the wet-lab / life-sciences analogue of AI-Driven Formal Proof Search: where formal proof search has a Lean compiler as an instant verifier, science's verifier is the experiment — slower and more expensive — so the claims here are empirical demonstrations and selected examples, not compiler-checked guarantees. The results are the sharpest evidence yet for the less-conservative reading of recursive self-improvement: that "perspiration is becoming automated" reaches into discovery itself, and that research taste may be "just another capability AI fails at for a time, then gets good at."
The three results#
Drug / protein design — autonomy at human level#
Anthropic's internal protein-design experts accelerated aspects of drug design "by around 10 times" using Mythos 5. In one study, Mythos 5 — equipped with protein-design and bioinformatics tools but no human assistance — matched or beat skilled human operators, executing "all of the tasks normally completed by a scientist: choosing binding sites, selecting and running protein design tools, and recovering from failures along the way." 9 of 14 protein targets yielded strong drug-design candidates now under investigation (immune checkpoints, growth-factor/receptor signaling, neurodegeneration, muscle disease, harder structural targets).
Novel hypotheses — preferred over Opus-class, one corroborated#
Mythos 5 is Anthropic's "first model to consistently produce novel, compelling scientific hypotheses." In blinded head-to-head comparisons against Opus-class models, Anthropic scientists preferred Mythos's molecular-biology hypotheses ~80% of the time, and advanced several to experimental evaluation. One Mythos hypothesis — a novel mechanism for an E. coli protein — was independently corroborated by a study from a lab working on the same problem.
Genomics — a week of autonomy beating a published model at 100× smaller#
Over "more than a week of largely autonomous work," Mythos 5 assembled single-cell data for millions of cells across 138 animal species, then designed and trained a custom machine-learning model to identify cells performing the same role in even distantly related organisms. With only high-level human input, that trained model outperformed a recent model published in Science — despite being 100× smaller. Anthropic intends to publish.
The dual-use shadow#
The same capability is why biology must be safeguarded in the general-access Fable 5. The motivating evaluation: predicting how a genetic modification affects adeno-associated virus (AAV) capsid assembly — a real gene-therapy component whose design capability "in the wrong hands, could enable the design of dangerous viruses." Mythos-class models outperformed dedicated protein-language models on this without being trained for the task, using biological reasoning alone. Autonomous scientific capability and bio-uplift risk are the same capability seen from two sides — the core tension the RSP CB determination and the bio classifier exist to manage.
Why it matters for the trajectory#
- Perspiration automation reaches discovery. When AI builds itself argued most research progress is incremental "scale-it-up-see-what-breaks-fix-it" work that Claude excels at. Autonomous genomics — assemble data, design a model, train it, beat the baseline — is that loop run end-to-end in a science domain, not just engineering.
- It chips at the taste moat. "Consistently produce novel, compelling hypotheses" and "only high-level human input" are exactly the direction-setting functions presumed to stay human. The ~80% blinded preference is a concrete crack — though still human-judged and internally sourced.
- Still jagged, still gated by verification. These are curated demonstrations (Jagged Intelligence (Ghosts, Not Animals)); science's verifier is slow wet-lab confirmation, not a compiler, so unlike AI-Driven Formal Proof Search the results can't be auto-validated — they await experimental and peer review. This keeps it adjacent to, but below, the AI-R&D autonomy threshold Anthropic gates on.
Connections#
- AI-Driven Formal Proof Search — the formal-math sibling: AI doing novel research, but with an instant compiler-verifier; science substitutes the (slow, costly) experiment, so verification is the harder bottleneck here
- Recursive Self-Improvement — the clearest wet-lab evidence for "perspiration is becoming automated," the essay's less-conservative reading
- Research Taste as the Human Bottleneck — autonomous hypothesis-generation and "only high-level human input" are direct chips at the residual human comparative advantage
- AI R&D Autonomy Evaluation (AECI) — adjacent autonomy: a model designing+training a model and beating a published baseline is AI-R&D-shaped, though in genomics rather than AI itself
- Task Time-Horizon Scaling — "over a week of largely autonomous work" is a concrete long-horizon datapoint beyond Mythos Preview's measured 16h
- Jagged Intelligence (Ghosts, Not Animals) — the caveat: these are selected demonstrations of a still-jagged capability, not uniform competence
- The Verifiability Thesis — the limiting case: science is less verifiable than Lean proof, so autonomy outruns cheap verification — the experiment, not a compiler, is the reward signal
- Capability-Gated Model Fallback — the dual-use flip side; the AAV result is the bio classifier's motivating example
- Responsible Scaling Policy Evaluations — the CB (chemical/biological) risk domain these capabilities advance
- Claude Mythos 5 — the model (bio safeguards lifted) that produced these results
- Claude Fable 5 — the general-access sibling on which biology is safeguarded
Open questions#
- Every result is Anthropic-reported and example-selected; the genomics "100× smaller beats Science" claim is "intend to publish" — what survives external peer review?
- Science's verification gap: the formal-proof loop self-validates; here a wrong-but-confident hypothesis costs a wet-lab cycle to falsify. Does autonomy without a fast verifier increase the verification bottleneck rather than relieve it?
- If hypothesis-generation is genuinely at ~80% preference, how much of "research taste" is left as a distinctively human function — and how would you measure the residue?
Sources#
- Claude Fable 5 and Claude Mythos 5 — §"Evaluating Claude Fable 5 and Claude Mythos 5" (drug design; novel hypotheses; genomics) and §"Biology and chemistry" (AAV dual-use)
Cited by 13
- AI-Driven Formal Proof Search
LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEI…
- AI R&D Autonomy Evaluation (AECI)
How Anthropic measures whether a model can automate or dramatically accelerate AI research — the capability that drives…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Capability-Gated Model Fallback
Fable 5's safeguard architecture: classifiers detect cyber / bio-chem / distillation queries and route the response to…
- Claude Fable 5
Anthropic's first generally-available Mythos-class model (June 2026) — state-of-the-art on nearly all benchmarks; the s…
- Claude Mythos 5
The safeguards-lifted form of Claude Fable 5 (June 2026): same underlying Mythos-class model, deployed through Project…
- Jagged Intelligence (Ghosts, Not Animals)
"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the l…
- Governance & Workforce
Map of Content for the governance-workforce domain — 11 concepts. Curated entry point; see Home for all domains.
- Open Questions Backlog
_96 pages with open questions, as of 2026-06-14._
- Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
- Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
- Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
- Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
Related articles
- Claude Opus 4.8
Anthropic's most capable general-access model (May 2026); upgrade on Opus 4.7 in SWE/agentic/knowledge work; does not a…
- Mythos Model
Anthropic preview-tier frontier model and the first member of the Mythos-class tier (above Opus); gated for safety, use…
- Recursive Self-Improvement
An AI system autonomously designing and developing its own successor; Anthropic Institute's *When AI builds itself* arg…
- Responsible Scaling Policy Evaluations
Anthropic's RSP gates deployment on pre-release capability evaluations in CBRN, automated AI R&D, and high-stakes misal…
- LLM-Driven Vulnerability Research
Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…
