What is Vinge’s principle?

3 min read

Suggest changes in Google Docs

Vinge’s principle says that, in "rich" domains, a system that is less intelligent¹

in that domain cannot predict the exact actions that a more intelligent agent will take, even if the less intelligent system knows the more intelligent agent's goal(s).^{2Vinge’s principle is named after Vinge’s Law, an idea about fiction writing which states that "characters cannot be significantly smarter than their authors… because to really know how a character like that would think, you’d have to be that smart yourself."}

A domain is rich, relative to a given level of intelligence, if it's complicated enough that perfect decision-making isn't possible with that level of intelligence. For instance, tic-tac-toe is a non-rich domain, since tic-tac-toe is solved: from every single board state, there is one (or a set of equally) optimal move(s), and it's not possible to play better than that. In other words, there's a "ceiling" on how good one can be at tic-tac-toe — if you³

were to play tic-tac-toe against a superintelligent computer, it couldn't come up with some undiscovered strategy to beat you, because no such strategy exists.^{4Since the first player in tic-tac-toe can always force a draw (at worst) with optimal play, nothing, no matter how capable, could ever beat you if you go first (assuming it’s playing fair!).}

We say a domain is "rich" (from our perspective) if:

The space of potential actions and strategies is too large and/or irregular for us to find the optimal strategies (e.g. chess), or
The known mechanics of the domain do not permit us to place absolute bounds on which kinds of outcomes or goals are in principle achievable (e.g. geopolitics).

In general, any domain where we continue to discover better strategies over time is necessarily rich.

Vinge’s principle is relevant to AI safety since 1) it limits our ability to predict the actions of an AI with greater-than-human intelligence, and 2) it might limit the ability of an AI agent to safely design more intelligent AI (or to self-modify itself into a more intelligent version).

Intelligence here refers to the capacity to choose actions that successfully achieve one's goals within some or many domains. ↩︎
Vinge’s principle is named after Vinge’s Law, an idea about fiction writing which states that "characters cannot be significantly smarter than their authors… because to really know how a character like that would think, you’d have to be that smart yourself." ↩︎
This assumes you know optimal play, which is realistic for most human adults that have played a few games. ↩︎
Since the first player in tic-tac-toe can always force a draw (at worst) with optimal play, nothing, no matter how capable, could ever beat you if you go first (assuming it’s playing fair!). ↩︎