Androkguz's question on Mesa-Optimizers
OK. Doesn't this problem require a self awareness of the system that it will get a deployment phase and an estimation of the training schedule?
Is there any Ai now a days that is in a similar situation?
Wouldn't this be "solved" if we "just" never allowed it to know that there's going to be a situation where it's not going to have the base optimizer?
Yes. A system must be self-aware to understand that its goal can be changed, to the degree that it can carry out such deception. There are no self-aware AI now (that we know of).Trying to fool an AGI into thinking that it's always being base-optimized works as long as we can outsmart it. Eventually, this strategy amounts to trying to outsmart a super-intelligence (unless one takes the route of making the system less capable in exchange for saftey), and that's a losing strategy by definition.
OriginWhere was this question originally asked
|YouTube (comment link)|
|On video:||The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment|
|Asked on Discord?||Yes|