Edit Answer Tags: TapuZuko's Answer to Might an aligned superintelligence immediately kill everyone and then go on to create a "hedonium shockwave"?

From Stampy's Wiki
Log-in is required to edit or create pages.

You do not have permission to edit this page, for the following reason:

The action you have requested is limited to users in the group: Users.


Answer text

I think an AI inner aligned to optimize a utility function of maximize happiness minus suffering is likely to do something like this.

Inner aligned meaning the AI is trying to do the thing we trained it to do. Whether this is what we actually want or not.

"Aligned to what" is the outer alignment problem which is where the failure in this example is. There is a lot of debate on what utility functions are safe or desirable to maximize, and if human values can even be described by a utility function.