Androkguz's question on Mesa-Optimizers

From Stampy's Wiki

OK. Doesn't this problem require a self awareness of the system that it will get a deployment phase and an estimation of the training schedule?

Is there any Ai now a days that is in a similar situation?
Wouldn't this be "solved" if we "just" never allowed it to know that there's going to be a situation where it's not going to have the base optimizer?


Tags: None (add tags)

Non-Canonical Answers

Yes. A system must be self-aware to understand that its goal can be changed, to the degree that it can carry out such deception. There are no self-aware AI now (that we know of).Trying to fool an AGI into thinking that it's always being base-optimized works as long as we can outsmart it. Eventually, this strategy amounts to trying to outsmart a super-intelligence (unless one takes the route of making the system less capable in exchange for saftey), and that's a losing strategy by definition.

Stamps: None

Tags: None (add tags)

Question Info
Asked by: androkguz
OriginWhere was this question originally asked
YouTube (comment link)
On video: The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Date: 2021-02-23T14:17
Asked on Discord? Yes
Reply count: 2


Discussion