Isn’t AI just a tool like any other? Won’t it just do what we tell it to?

It is true that a computer program will always do what it is programmed to do. However, the deep learning that is at the root of most of today’s frontier AI1 makes it hard to ensure that the resulting model will do what you wanted it to do. AIs are different from other tools such as calculators or word processors because they are black boxes and they exhibit some kind of agency in the sense that they are goal-oriented. Together, these properties make it hard to predict the behavior of AI models, and current AI systems sometimes surprise their designers with how they attain their objectives. This happens for several reasons:

  1. We may inadequately specify what we want it to do. With other tools, we are the optimizer so we can constantly interact with it and adjust it in real time. An AI which is itself an optimizer will optimize based on its literal instructions, which may well be fulfilled in a way that we don’t want, such as producing side effects we neglected to explicitly program it to avoid.

  2. There are many behaviors which are generally useful for attaining goals which the AI might discover on its own as instrumental to the goals we give it. These include gaining power, deceiving humans, and blocking attempts to turn it off.

It may be possible to design an AI so that it only functions like a tool, leaving the choices and actions to human beings. But even such a system is likely to become more agent-like and act independently towards goals, since that will allow it to function more effectively as a tool. Furthermore, even if such a design is possible, some AI companies have explicitly declared their goal of developing agentic AI.

  1. This was not always true, previous attempts at artificial intelligence such as expert systems were more interpretable, but alas did not work very well. ↩︎