What is Vinge’s principle?
Vinge’s principle says that, in "rich" domains, a system that is less intelligent1
A domain is rich, relative to a given level of intelligence, if it's complicated enough that perfect decision-making isn't possible with that level of intelligence. For instance, tic-tac-toe is a non-rich domain, since tic-tac-toe is solved: from every single board state, there is one (or a set of equally) optimal move(s), and it's not possible to play better than that. In other words, there's a "ceiling" on how good one can be at tic-tac-toe — if you3
We say a domain is "rich" (from our perspective) if:
-
The space of potential actions and strategies is too large and/or irregular for us to find the optimal strategies (e.g. chess), or
-
The known mechanics of the domain do not permit us to place absolute bounds on which kinds of outcomes or goals are in principle achievable (e.g. geopolitics).
In general, any domain where we continue to discover better strategies over time is necessarily rich.
Vinge’s principle is relevant to AI safety since 1) it limits our ability to predict the actions of an AI with greater-than-human intelligence, and 2) it might limit the ability of an AI agent to safely design more intelligent AI (or to self-modify itself into a more intelligent version).
Intelligence here refers to the capacity to choose actions that successfully achieve one's goals within some or many domains. ↩︎
Vinge’s principle is named after Vinge’s Law, an idea about fiction writing which states that "characters cannot be significantly smarter than their authors… because to really know how a character like that would think, you’d have to be that smart yourself." ↩︎
This assumes you know optimal play, which is realistic for most human adults that have played a few games. ↩︎
Since the first player in tic-tac-toe can always force a draw (at worst) with optimal play, nothing, no matter how capable, could ever beat you if you go first (assuming it’s playing fair!). ↩︎