From Stampy's Wiki
My initial response at 5:25 - Maybe sacrificing a lot of world state for a bit and then cleaning up later is cheaper than than making minimal changes throughout? Seems unlikely. Wait, it's essentially acting like the world can just pause in place while it accomplishes its task! The more it knows about the prior world state the more it's trying to correct for changes, and the more points it would get for successfully correcting them.
Result: ok, yeah, it's trying to reduce all changes, not just its changes.

6:55 - Well that seems like a lot of predictions to make, but besides that, maybe it spends too much time trying to precisely put back all the milk and sugar? This is harder.
Result: ok, you mention that about the distance measure.

Asked by: Aexis Rai
YouTube (comment link)
On video: Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
Date: 2017-06-18T17:20
