What is Future of Humanity Instititute working on?
FHI does a lot of work on non-technical AI safety, but as far we can tell their primary technical agenda is the Causal incentives group (joint between FHI and DeepMind), who uses notions from causality to study incentives and their application to AI Safety. Recent work includes:
- Agent Incentives: A Causal Perspective, a paper which formalizes concepts such as the value of information and control incentives.
- Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective, a paper which theoretically analyzes wireheading.
OriginWhere was this question originally asked