What about having a human supervisor who must approve all the AI's decisions before executing them?

From Stampy's Wiki

Non-Canonical Answers

The problem is that the actions can be harmful in a very non-obvious, indirect way. It's not at all obvious which actions should be stopped.

For example when the system comes up with a very clever way to acquire resources - this action's safety depends on what it intends to use these resources for.

Such a supervision may buy us some safety, if we find a way to make the system's intentions very transparent.

Canonical Question Info
(edits welcome)
Asked by: filip
OriginWhere was this question originally asked
Wiki
Date: 2021-8-9


Discussion