What are the "win conditions" for AI alignment?

There is currently no consensus among researchers on how to align an artificial general intelligence (AGI). Many researchers have their own paradigms, and they view the problem from many different angles.

The agent foundations agenda starts with the premise that we are fundamentally confused about alignment and need to figure out “true names” of various concepts, which are formalisms that point exactly to what we mean. For example, we need to pin down which decision theory and ontology an AGI should use.

An agentic AGI must learn what we value, especially if it is a sovereign. Reinforcement learning from human feedback (RLHF) attempts to teach an AI what we value by having humans evaluate outputs and rate them positively or negatively to fine-tune the model before it is deployed.

See also the AI alignment landscape and formal alignment.