How might non-agentic GPT-style AI cause an "intelligence explosion" or otherwise contribute to existential risk?

From Stampy's Wiki

There is a general consensus that any AGI would be very dangerous because not necessarilly aligned. But if the AGI does not have any reward function and is a pattern matcher like GPT, how would it go about to leading to X-risks/not being able to be put into a box/shut down?
I can definitely imagine it being dangerous, or it having continuity in its answer which might be problematic, but the whole going exponential and valuing its own survival does not seem to necessarilly apply?

Canonical Answer

One threat model which includes a GPT component is Misaligned Model-Based RL Agent. It suggests that a reinforcement learner attached to a GPT-style world model could lead to an existential risk, with the RL agent being the optimizer which uses the world model to be much more effective at achieving its goals.

Another possibility is that a sufficiently powerful world model may develop mesa optimizers which could influence the world via the outputs of the model to achieve the mesa objective (perhaps by causing an optimizer to be created with goals aligned to it), though this is somewhat speculative.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Canonical Question Info
(edits welcome)
Asked by: ^,^
OriginWhere was this question originally asked
Wiki
Date: 2021/09/12


Discussion