deceptive alignment

From Stampy's Wiki
Deceptive alignment
deceptive alignment
Parent Tag: deception
Arbital Page

Canonically answered

How quickly could an AI go from the first indications of problems to an unrecoverable disaster?

Show your endorsement of this answer by giving it a stamp of approval!

If the AI system was deceptively aligned (i.e. pretending to be nice until it was in control of the situation) or had been in stealth mode while getting things in place for a takeover, quite possibly within hours. We may get more warning with weaker systems, if the AGI does not feel at all threatened by us, or if a complex ecosystem of AI systems is built over time and we gradually lose control.

Paul Christiano writes a story of alignment failure which shows a relatively fast transition.

Unanswered non-canonical questions