|Main Question: Why might an AI system deceive its programmers or users?|
|Child tag(s): deceptive alignment|
|Alignment Forum Tag|
Deception is the act of sharing information in a way which intentionally misleads others.
Talking about full AGI: Fairly likely, but depends on takeoff speed. In a slow takeoff of a misaligned AGI, where it is only weakly superintelligent, manipulating humans would be one of its main options for trying to further its goals for some time. Even in a fast takeoff, it’s plausible that it would at least briefly manipulate humans in order to accelerate its ascent to technological superiority, though depending on what machines are available to hack at the time it may be able to skip this stage.
If the AI's goals include reference to humans it may have reason to continue deceiving us after it attains technological superiority, but will not necessarily do so. How this unfolds would depend on the details of its goals.
Eliezer Yudkowsky gives the example of an AI solving protein folding, then mail-ordering synthesised DNA to a bribed or deceived human with instructions to mix the ingredients in a specific order to create wet nanotechnology.
People tend to imagine AIs as being like nerdy humans – brilliant at technology but clueless about social skills. There is no reason to expect this – persuasion and manipulation is a different kind of skill from solving mathematical proofs, but it’s still a skill, and an intellect as far beyond us as we are beyond lions might be smart enough to replicate or exceed the “charming sociopaths” who can naturally win friends and followers despite a lack of normal human emotions.
A superintelligence might be able to analyze human psychology deeply enough to understand the hopes and fears of everyone it negotiates with. Single humans using psychopathic social manipulation have done plenty of harm – Hitler leveraged his skill at oratory and his understanding of people’s darkest prejudices to take over a continent. Why should we expect superintelligences to do worse than humans far less skilled than they?
More outlandishly, a superintelligence might just skip language entirely and figure out a weird pattern of buzzes and hums that causes conscious thought to seize up, and which knocks anyone who hears it into a weird hypnotizable state in which they’ll do anything the superintelligence asks. It sounds kind of silly to me, but then, nuclear weapons probably would have sounded kind of silly to lions sitting around speculating about what humans might be able to accomplish. When you’re dealing with something unbelievably more intelligent than you are, you should probably expect the unexpected.
It could be more useful to prevent the use of simpler AIs to create a lot of wealth while causing harm to others. Legal obligations will be probably less relevant to a potentially deceptive super-intelligent AGI, but the symbolic meaning seems more likely to be beneficial than harmful for communicating human values, so not useless overall.
The problem is that the actions can be harmful in a very non-obvious, indirect way. It's not at all obvious which actions should be stopped.
For example when the system comes up with a very clever way to acquire resources - this action's safety depends on what it intends to use these resources for.
Such a supervision may buy us some safety, if we find a way to make the system's intentions very transparent.