matbmp's question on Intro to AI Safety

How about two phases of deploying AGI (1) Give very limited external action capability to AI and set its goal to make good internal model of the world(good - meaning also very similar to human's model of the world) (2) set the AI goal to identify itself as a human and next - unlock(maybe gradually) action capability(end) Is there something missing in this thought, seeing intelligence not as capability to accomplish goals, but the ability to make models of the world?(Joscha Bach idea). I see that merely observing human behaviour with the goal to emulate human thought process could be a hard task, but with some help of human neuroscience, building system capable of such thing should be easier. Alternatively we could exclude unlocking from the second phase, but we could only be benefiting from existence of this AI through communication(and we could thus verify it's ideas, ask about details). This is all about how intelligence can be externalized. Is the mathematician(ex. Newton) and his intelligence not valuable for us because he wrote on the paper correct formulas that made not him, but some other human, go to the moon? Intelligence without goals is purposeless(by definition), but the goals don't have to be highly external, they can be about having good internal models(in humans-coherent, predictive, allowing efficient pattern recognition models and other) - which are very important to us. This is if "emulate human action" approach would not work for whatever reason.

Non-Canonical Answers

Seems very much like "raising AI like kids".
The other issue involved here is the AI could—instead of committing to learning human values by osmosis—emulate the desired values long enough to be let off the leash.


matbmp
YouTube (comment link)
Intro to AI Safety, Remastered
2021/06/25
