How likely is extinction from superintelligent AI?

5 min read

Suggest changes in Google Docs

Predicting events that are unlike anything that has ever happened is hard, but producing educated guesses on such subjects is necessary. Various people have tried putting numbers on their informed guesses of the chance of superintelligence leading to existential catastrophe, giving subjective estimates ranging from under 1% to over 90%.

Eliezer Yudkowsky and Nate Soares at the Machine Intelligence Research Institute (MIRI) are examples of researchers who give high probabilities of extinction. In Yudkowsky’s view, humanity is on the bad end of a logistic success curve: because our response to the problem is seriously inadequate in multiple ways, any individual improvement won’t do much good by itself. By their mainline models, we’d need to “move up the curve” by doing better in several dimensions before we’d start seeing our probability of survival increase by a noticeable amount from the present ~0%.

Others, including Paul Christiano and Katja Grace, give lower but still substantial probabilities of extinction (20%¹ & 19%)².

Joe Carlsmith wrote a report which offers a framework for calculating the probability of power-seeking AI causing an existential catastrophe. The calculation involves multiplying factors like "how likely are AI systems to be agentic?" and "how likely is a warning shot?". Carlsmith gave a final estimate of >10%; various reviewers used the same model to come up with different probabilities..

The Forecasting Research Institute ran a study in 2022, called the “Existential Risk Persuasion Tournament” (XPT), in which they gathered predictions from two groups: “experts” on AI and other fields relevant to existential risk, and “superforecasters” who had performed well on short-term geopolitics predictions. The probability of AI causing catastrophe (defined as killing at least 10% of the population in a 5-year period), according to the median expert and superforecaster, was 12% and 2% respectively; for AI causing human extinction, it was 3% and 0.4%.³ ⁴

Though the range of estimates is wide, even those at the low end are worryingly high. Ben Garfinkel estimates the existential risk from power-seeking AI by 2070 at only 0.4%, but nevertheless believes major efforts are justified to understand and reduce it.

Paul Christiano gives 20% as his guess for “Probability that most humans die within 10 years of building powerful AI (powerful enough to make human labor obsolete)”, alongside guesses of 22% ‘Probability of AI takeover” and 55% on “humanity irreversibly mess[ing] up our future within 10 years of building powerful AI” ↩︎
Katja Grace gives 19% as her “overall probability of doom”, which includes some non-extinction scenarios. Her talk outlines her overall model. ↩︎
The risk estimates of XPT participants, especially superforecasters, are much lower than those of most researchers in the field. This may be largely because they’re confident that AI won’t have transformative consequences soon: XPT superforecasters predicted only a 3.75% chance of transformative AI, defined as at least 15% economic growth in a year, by the year 2070. On nearer-term questions whose outcome will be known soon, it looks like XPT participants have strongly underpredicted the speed of AI progress. For example, they have underestimated progress on the MATH and MMLU benchmarks, and it looks like both computing power available per dollar and money spent on training runs will easily exceed the 95th percentile estimate of the XPT superforecasters. ↩︎
Even after some discussion, the estimates of both groups did not converge. ↩︎