Sudonym's Answer to What does alignment failure look like?
Alignment failure, at its core, is any time an AI's output deviates from what we intended. We have already witnessed alignment failure in simple AIs. Mostly, these amount to correlation being equated to causation. A good example was an AI built by Youtube to recognize animals being forced to fight for sport. The videos given to the AI were always set in some kind of arena, so the AI drew the simplest conclusion and matched videos where there were similar arenas—such as with robot combat tournaments.
|Original by:||Damaged (edits by sudonym)|