Doublebrass's question on Reward Hacking
Super interesting! If this kind of reward hacking exists in current AI, does that have any kind of serious implications if someone wanted to deploy one for the stock market, for example? Like would the AI seek to "cheat" and commit fraud or some gain insider info rather than play the stock market fairly?
OriginWhere was this question originally asked
|YouTube (comment link)|
|On video:||Reward Hacking: Concrete Problems in AI Safety Part 3|
|Asked on Discord?||No|