Why can't we just use a friendly AI to stop bad AIs?
People such as Yann Lecun, Chief AI Scientist at Meta, have suggested that misuse of AI[1] could be countered by some form of more powerful friendly AI.
This plan holds some promise, but requires a few things to go right
-
The friendly AI must be aligned in the first place, which we currently do not know how to reliably do.
-
The friendly AI must constantly wield a strong strategic advantage over any unaligned AI. In a multipolar scenario, it is unclear whether offense or defense would be favored.
Dan Hendrycks proposed the idea of an AI Leviathan[2] that is composed of a collection of sufficiently aligned AI that act to stop any misbehaving AI. The emergence of such a Leviathan could constitute a pivotal act. On the other hand, an extremely powerful singleton superintelligence would be able to act in a way which prevents any other AI systems from being created.
Such control from AI, whether from a singleton or a Leviathan, involves some important concentration of power.
This argument might also apply to misaligned AI. ↩︎
The name is a reference to Thomas Hobbes’ book of the same name. ↩︎