Can we add "friendliness" to any artificial intelligence design?

1 min read

Suggest changes in Google Docs

Current AIs do not have a well-defined “slot” in which a goal (such as “be friendly to human interests”) could be placed. This limitation might extend to future, potentially superintelligent, AIs. For example, an AI created via deep learning on neural nets, whole brain emulation, or evolutionary algorithms would end up with some goal, but that goal may be very difficult to predict or set in advance.

Thus, in order to design a friendly AI, it is not sufficient to determine what “friendliness” is (and to specify it clearly enough that even a superintelligence will interpret it the way we want it to). We must also figure out how to build a general intelligence that pursues a specific goal at all, and that stably retains that goal as it edits its own code to make itself smarter. Some consider this task the primary difficulty in designing friendly AI.

Can we add "friendliness" to any artificial intelligence design?

In progress