What is an agent?

4 min read

Informally, an "agent" is something whose actions can be understood as directed at achieving a goal.

One way of identifying agency is to ask how a system would behave if its environment changed. For example, imagine a system which gives advice, but imagine that (for whatever reason) people start to consistently do the opposite of what it suggests. If the system is non-agentic, it will continue to give the same advice, but if it is an agent with the goal of causing people to be successful, it will try to find other ways of achieving that goal. For example, it might start giving the opposite advice.

Some researchers focus on the system’s ability to act in the world on its own. For example, Richard Ngo argues that there are six factors which contribute to a system being agent-like:

Self-awareness: it understands that it’s a part of the world, and that its behavior impacts the world;

Planning: it considers a wide range of possible sequences of behaviors (let’s call them “plans”), including long plans;

Consequentialism: it decides which of those plans is best by considering the value of the outcomes that they produce;

Scale: its choice is sensitive to the effects of plans over large distances and long time horizons;

Coherence: it is internally unified towards implementing the single plan it judges to be best;

Flexibility: it is able to adapt its plans flexibly as circumstances change, rather than just continuing the same patterns of behavior.

These features are distinct properties of human beings that collectively describe human agency. From this perspective, agency isn’t binary; rather, it exists as a spectrum along each of these dimensions, and different systems could have any combination of them.

In addition to disagreements over which properties are part of agency, there are also disagreements over whether there is an objective distinction between agents and non-agents: is everything in the world either one or the other, or is agency (merely) a "stance" that an observer can take towards a system as a way of understanding and predicting its behavior? As an example of agency as a stance, we might think of another human being as an agent because we don’t know all of the psychological factors which go into their decisions; that is, we might interpret their behavior through understanding the goals they are pursuing. However, a superintelligence that was able to fully predict their behavior based on a mechanistic understanding of their psyche would not relate to them as an agent; it wouldn’t need to posit "goals" in order to understand their behavior.

The agent foundations research agenda focuses on defining and clarifying the properties of agents and identifying those properties in systems. However, because the term "agent" has a variety of definitions across various fields, some people^[1] think that a focus on "agents" per se can be misleading and instead focus on more crisply-defined entities like "optimizers."

This includes the originators of the agent foundations research program. ↩︎

What is instrumental convergence?

What is the relationship between goals, intelligence, agency and optimization?

What is an optimizer?