Why can't we just use a friendly AI to stop bad AIs?
Some people, including Yann LeCun, have suggested that misuse of AI1
This plan holds some promise, but requires a few things to go right:
-
We need to learn how to reliably align such a defensive AI so that it is robustly “good”. Currently, we don’t know how to do this.
-
The defensive AI must consistently maintain a strong strategic advantage over any unaligned AI. Whether that’s likely depends on considerations like whether the relevant technologies favored offense or defense (e.g., infecting people might be intrinsically much easier than curing them), and how much of an alignment tax the defensive AI faced.
A collection of such defensive AIs, if they coordinate, might amount to what Dan Hendrycks calls an “AI Leviathan”2
This argument might also apply to misaligned AI. ↩︎
The name is a reference to Thomas Hobbes’ book of the same name. ↩︎