Beacon of Wierd's question on Mesa-Optimizers

From Stampy's Wiki
Beacon of Wierd's question on Mesa-Optimizers id:UgybIHeRGky2DSJCPyN4AaABAg

What if we just ... Don't provide something to optimize over?
Like the different GPT networks, we just feed it all the information and then ask it to do stuff?
I'm not saying this will magically make the AI good or so, I'm more or less just asking what would happen. Say we build an AI with the same basic interface to the world as we have, Vision and hearing (let's ignore the other stuff for now). Then we just train it to predict what comes next for all the data we have, every site on the web, every video on YT etc. We could also put it in a humanoid body and ask it to predict it's own movements and match them to how it's learnt that other humans move. If you ask it to "Go take out the trash", it will know what it means and it will understand how to do it from all the videos and such (assuming it's generalized and trained well). It will also know that it should avoid traffic, not harm children and all that "fluffy stuff" that we just do on a regular basis without thinking about it.

I can't see how this kind of "prediction AI" would have much of a desire to destroy the world. Sure, you could ask it to do horrible things and it might comply, so it's not "safe" in that aspect, but it might just as well say "YOU'RE A VILLAIN, I'M A HERO, YOU DIE NOW!" and kill you for giving it "unethical orders". If it reads ALL of mankinds work it should generalize for all of that.
You could even give it the constant objective to "Be a good person", and it will incorporate all the fluffy stuff of what a good person means to mankind without you having to specify it.


Tags: None (add tags)
Question Info
Asked by: Beacon of Wierd
OriginWhere was this question originally asked
YouTube (comment link)
On video: The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Date: 2021-03-01T09:48
Asked on Discord? Yes
Reply count: 2


Discussion