Take AISafety.info’s 3 minute survey to help inform our strategy and priorities

Take the survey
Alignment research

Current techniques
Benchmarks and evals
Prosaic alignment
Interpretability
Agent foundations
Other alignment approaches
Organizations and agendas
Researchers

What is the Center for Human Compatible AI (CHAI)?

CHAI is an academic research organization affiliated with UC Berkeley. It is led by Stuart Russell

, but includes many other professors and grad students pursuing a diverse array of approaches. For more information, see CHAI's 2022 progress report.

Russell's book Human Compatible outlines his AGI

alignment strategy, which is based on cooperative inverse reinforcement learning (CIRL). The basic idea of CIRL is to play a cooperative game where the agent and the human try to maximize the human reward together, but only the human knows what the human reward is. Since the AGI has uncertainty, it will defer to humans and be corrigible.

Other work includes Clusterability in neural networks, which tries to measure the modularity of neural networks by thinking of the network as a graph and performing the graph n-cut.

Keep Reading

Continue with the next entry in "Alignment research"
What is the Center on Long-Term Risk (CLR)'s research agenda?
Next
Or jump to a related question


AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.