Answer to Can we add friendliness to any artificial intelligence design?
Many AI designs that would generate an intelligence explosion would not have a ‘slot’ in which a goal (such as ‘be friendly to human interests’) could be placed. For example, if AI is made via whole brain emulation, or evolutionary algorithms, or neural nets, or reinforcement learning, the AI will end up with some goal as it self-improves, but that stable eventual goal may be very difficult to predict in advance.
Thus, in order to design a friendly AI, it is not sufficient to determine what ‘friendliness’ is (and to specify it clearly enough that even a superintelligence will interpret it the way we want it to). We must also figure out how to build a general intelligence that satisfies a goal at all, and that stably retains that goal as it edits its own code to make itself smarter. This task is perhaps the primary difficulty in designing friendly AI.
|Original by:||Luke Muehlhauser (edits by plex)|
|Based on:||MIRI's Intelligence Explosion FAQ|