SlimeBunnyBat's Answer to Andy Gee's question on Mesa-Optimizers 2
Trying to hide information from an AGI is almost certainly not an avenue towards safety - if the agent is better at reasoning than us, it is likely to derive information relevant to safety considerations that we wouldn't think to hide. It is entirely appropriate, then, to use thought experiments like these where the AGI has such a large depth of information, because our goal should be to design systems that behave safely even in such permissive environments.
@3:54 you mention providing the whole of Wikipedia for learning data. Wikipedia details several methods for breaking memory containment. If this is provided to an advanced AI couldn't that AI become aware that it may me constrained within blocks of memory, and thus attempt to bypass those constraints to maximize it's reward function?
These vulnerabilities have been present in all Intel and AMD CPUs for 20+ years before discovery and have been largely mitigated, however the "concept" of looking for vulnerabilities in micro architecture is something an AI can do a lot better than humans can. If you read the Assembly for pre-forking in Intel chips, it's pretty obvious the entire memory space is available while the CPU is predicting what will be required of it next.
Presuming containment of an AI system is important, isn't feeding massive datasets a considerable risk, not only for intellectual property rights but to maintain control of the AI?
Here's some examples of existing vulnerabilities, who knows how many more there are.
|Original by:||Damaged (edits by Stampy, Aprillion, plex)|