Preventing an AI from escaping by using a more powerful AI, gets points for creative thinking, but unfortunately we would need to have already aligned the first AI. Even if the second AI's only terminal goal were to prevent the first ai from escaping, it would also have an instrumental goal of converting the rest of the universe into computer chips so that it would have more processing power to figure out how to best contain the first AGI.
It might be possible to try to bind a stronger AI with a weaker AI, but this is unlikely to work as the stronger AI would have an advantage due to being stronger. Further, there is a chance that the two AI's end up working out a deal where the first AI decides to stay in the box and the second AI does whatever the first AI would have down if it were able to escape.
One of the main questions about simulation theory is why would a society invest a large quantity of resources to create it. One possible answer is an environment to train/test AI, or run it safely isolated from an outside reality.
It's a fun question but probably not one worth thinking about too much. This kind of question is impossible to get information from observations and experiments.
I think an AI inner aligned to optimize a utility function of maximize happiness minus suffering is likely to do something like this.
Inner aligned meaning the AI is trying to do the thing we trained it to do. Whether this is what we actually want or not.
"Aligned to what" is the outer alignment problem which is where the failure in this example is. There is a lot of debate on what utility functions are safe or desirable to maximize, and if human values can even be described by a utility function.
Autonomous weapons, especially a nuclear arsenal, being used by an AI is a concern, but this seems downstream of the central problem of giving an unaligned AI any capabilities to impact the world.
Triggering nuclear war is only one of many ways a power seeking AI might choose to take control. This seems unlikely, as resources the AI would want to control (or the AI itself) would likely be destroyed in the process.