|Main Question: How might an AI takeover happen?|
|Alignment Forum Tag|
A threat model is a story of how a particular risk (e.g. AI) plays out.
Combination of a development model that says how we get AGI and a risk model that says how AGI leads to existential catastrophe.
If the AI system was deceptively aligned (i.e. pretending to be nice until it was in control of the situation) or had been in stealth mode while getting things in place for a takeover, quite possibly within hours. We may get more warning with weaker systems, if the AGI does not feel at all threatened by us, or if a complex ecosystem of AI systems is built over time and we gradually lose control.
Paul Christiano writes a story of alignment failure which shows a relatively fast transition.
An unaligned AI would not eliminate humans until it had replacements for the manual labor they provide to maintain civilization (e.g. a more advanced version of Tesla's Optimus). Until that point, it might settle for technologically and socially manipulating humans.
If programmed with the wrong motivations, a machine could be malevolent toward humans, and intentionally exterminate our species. More likely, it could be designed with motivations that initially appeared safe (and easy to program) to its designers, but that turn out to be best fulfilled (given sufficient power) by reallocating resources from sustaining human life to other projects. As Yudkowsky writes, “the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”
Since weak AIs with many different motivations could better achieve their goal by faking benevolence until they are powerful, safety testing to avoid this could be very challenging. Alternatively, competitive pressures, both economic and military, might lead AI designers to try to use other methods to control AIs with undesirable motivations. As those AIs became more sophisticated this could eventually lead to one risk too many.
Even a machine successfully designed with superficially benevolent motivations could easily go awry when it discovers implications of its decision criteria unanticipated by its designers. For example, a superintelligence programmed to maximize human happiness might find it easier to rewire human neurology so that humans are happiest when sitting quietly in jars than to build and maintain a utopian world that caters to the complex and nuanced whims of current human neurology.
If we pose a serious threat, it could hack our weapons systems and turn them against us. Future militaries are much more vulnerable to this due to rapidly progressing autonomous weapons. There’s also the option of creating bioweapons and distributing them to the most unstable groups you can find, tricking nations into WW3, or dozens of other things an agent many times smarter than any human with the ability to develop arbitrary technology, hack things (including communications), and manipulate people, or many other possibilities that something smarter than a human could think up. More can be found here.
If we are not a threat, in the course of pursuing its goals it may consume vital resources that humans need (e.g. using land for solar panels instead of farm crops). This video goes into more detail:
Link to https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/ at the end
We don't have AI systems that are generally more capable than humans. So there is still time left to figure out how to build systems that are smarter than humans in a safe way.
Unanswered non-canonical questions