Almost, but not entirely, Unreasonable's question on Reward Hacking Reloaded
How is AI 'per se' NOT a HUMAN REWARD HACK? It seems humans cannot be bothered to solve problems at a marginal level anymore, so some specialists develop AI to 'think / solve' electronically on behalf of people, ultimately displacing them entirely. Designed self-annihilating is AI, no more , no less.
Awesome, clear insight into KPI's: Show me how you measure me, and I'll show you how I behave. Its an age-old operations vs management issue, where both sets are trying to MINIMISE the other's influence, while trying to MAXIMISE their own. What an awesome problem to hand to a Technocratic Optimizing System. Who knows, it may even turn out balanced, in which case Management will summarily drop it. Maybe there IS hope for AI?
|Asked by:||Almost, but not entirely, Unreasonable
OriginWhere was this question originally asked
|YouTube (comment link)|
|On video:||Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5|
|Asked on Discord?||No|