Frontier Pause Verification

Sources#

When AI builds itself

Summary#

The governance response in When AI builds itself: if the RSI trajectory holds, the world should at least have the option to slow or temporarily pause frontier AI development so that societal structures and alignment research can keep up. But a pause is only useful if it is credible — multilateral and verifiable — because a unilateral pause merely changes who leads. The Anthropic Institute's stated agenda is to build the systems a credible slowdown would require. This is the policy bookend to the RSP's internal deployment brake: RSP gates one lab's releases; pause verification is the between-labs, between-nations coordination problem.

Why a unilateral pause isn't enough#

Anthropic's position: "if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe." A unilateral pause by one lab "is achievable immediately, but accomplishes much less: it would change who the front-runner is, but it would not create the wider deliberative process that is currently missing." Anthropic says it would slow or temporarily pause if other frontier-or-near-frontier developers did so in a verifiable manner — making verification the linchpin.

Why verification is unusually hard for AI#

A credible pause needs multiple well-resourced labs, in multiple countries, agreeing to stop under the same conditions, each able to verify the others actually stopped. AI makes even detectability (a lower bar than full verifiability) harder than for other technologies:

Training runs are easier to conceal than missile silos. No large physical signature to observe.
Inputs are general-purpose. Compute, data, and talent aren't weapons-specific, so you can't gate the precursors the way you can with, say, fissile material.
The incentive to defect quietly is enormous — "whoever continues while others pause could inherit the lead."
A credible pause must also specify what triggers it, what lifts it, and who adjudicates — undefined today.

The precedent and the time problem#

It is "not necessarily impossible in principle" — the world built verification regimes for complex technologies, e.g. the Intermediate-Range Nuclear Forces (INF) Treaty. But those regimes "took decades to build both the infrastructure and the trust," and on the RSI timeline "we don't have that long." Hence the Institute's bet: start building the detectability/verification infrastructure now, ahead of any agreement, so the option exists when it's needed. In the coming months Anthropic plans to convene policymakers, researchers, civil society, and other AI companies, and to publish the output — explicitly inviting non-AI-company voices into the deliberation.

Connections#

Recursive Self-Improvement — the trajectory that makes a pause option worth building; this is its governance response
Responsible Scaling Policy Evaluations — the single-lab deployment brake; pause verification is the multilateral counterpart
AI Accelerating AI Development — the compounding-acceleration evidence that makes "we don't have decades" the operative constraint
Agentic Misalignment (AM) — losing control is the downside a credible pause is meant to hedge against

Open questions#

What does an AI-training "verification regime" concretely consist of — compute-accounting, datacenter inspection, hardware attestation, on-chip telemetry? The essay names the problem, not the mechanism.
Detectability < verifiability: can detection even be made reliable when training runs leave no physical signature and inputs are dual-use?
Who adjudicates triggers and lifts? No institution currently holds that mandate, and standing one up is itself a decade-scale task.

Sources#

When AI builds itself — §"What should we do?" (verifiable multilateral pause; detectability vs verifiability; INF Treaty precedent; Anthropic Institute convenings)