From Stampy's Wiki
Main Question: Which organizations are working on AI alignment? (edit question) (edit answer)
Child tag(s): ai safety camp, future of humanity institute, miri

Canonically answered

I want to work on AI alignment. How can I get funding?

Show your endorsement of this answer by giving it a stamp of approval!

See the Future Funding List for up to date information!

The organizations which most regularly give grants to individuals working towards AI alignment are the Long Term Future Fund, Survival And Flourishing (SAF), the OpenPhil AI Fellowship and early career funding, the Future of Life Institute, the Future of Humanity Institute, and the Center on Long-Term Risk Fund. If you're able to relocate to the UK, CEEALAR (aka the EA Hotel) can be a great option as it offers free food and accommodation for up to two years, as well as contact with others who are thinking about these issues. There are also opportunities from smaller grantmakers which you might be able to pick up if you get involved.

If you want to work on support or infrastructure rather than directly on research, the EA Infrastructure Fund may be able to help. In general, you can talk to EA funds before applying.

Each grant source has their own criteria for funding, but in general they are looking for candidates who have evidence that they're keen and able to do good work towards reducing existential risk (for example, by completing an AI Safety Camp project), though the EA Hotel in particular has less stringent requirements as they're able to support people at very low cost. If you'd like to talk to someone who can offer advice on applying for funding, AI Safety Support offers free calls.

Another option is to get hired by an organization which works on AI alignment, see the follow-up question for advice on that.

It's also worth checking the AI Alignment tag on the EA funding sources website for up-to-date suggestions.

What approaches are AI alignment organizations working on?

Show your endorsement of this answer by giving it a stamp of approval!

Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s very hard to be confident which (if any) of them will actually work.

The following is a massive oversimplification, each organization actually pursues many different avenues of research, read the 2021 AI Alignment Literature Review and Charity Comparison for much more detail. That being said:

  • The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to provide anything like an assurance that a seed AI built will do good things if activated.
  • The Center for Human-Compatible AI focuses on Cooperative Inverse Reinforcement Learning and Assistance Games, a new paradigm for AI where they try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
  • Paul Christano's Alignment Research Center focuses is on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
  • The Future of Humanity Institute does work on crucial considerations and other x-risks, as well as AI safety research and outreach.
  • Anthropic is a new organization exploring natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
  • OpenAI is in a state of flux after major changes to their safety team.
  • DeepMind’s safety team is working on various approaches designed to work with modern machine learning, and does some communication via the Alignment Newsletter.
  • EleutherAI is a Machine Learning collective aiming to build large open source language models to allow more alignment research to take place.
  • Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.
  • Conjecture is an alignment startup which aims to scale alignment research, including new frames for reasoning about large language models,

scalable mechanistic interpretability, and history and philosophy of alignment.

There are many other projects around AI Safety, such as the Windfall clause, Rob Miles’s YouTube channel, AI Safety Support, etc.

Are Google, OpenAI, etc. aware of the risk?

Show your endorsement of this answer by giving it a stamp of approval!

The major AI companies are thinking about this. OpenAI was founded specifically with the intention to counter risks from superintelligence, many people at Google, DeepMind, and other organizations are convinced by the arguments and few genuinely oppose work in the field (though some claim it’s premature). For example, the paper Concrete Problems in AI Safety was a collaboration between researchers at Google Brain, Stanford, Berkeley, and OpenAI.

However, the vast majority of the effort these organizations put forwards is towards capabilities research, rather than safety.

Non-canonical answers

Which organizations are working on AI alignment?

Show your endorsement of this answer by giving it a stamp of approval!

There are numerous organizations working on AI alignment. A partial list includes:

For more information about the research happening at some of these organizations see a review (from 2021) here.

Since AI alignment is a growing field, new organizations are often created. Also, in addition to these organizations, there are a number of research groups at different universities whose research also focuses on AI alignment.

What is the Center for Human Compatible AI (CHAI)?

Show your endorsement of this answer by giving it a stamp of approval!

CHAI is an academic research organization affiliated with UC Berkeley. It is lead by Stuart Russell, but includes many other professors and grad students pursuing a diverse array of approaches, most of whom are not yet listed here. For more information see their 2022 progress report.

Stuart wrote the book Human Compatible, in which he outlines his AGI alignment strategy, which is based on cooperative inverse reinforcement learning (CIRL). The basic idea of CIRL is to play a cooperative game where both the agent and the human are trying to maximize the human's reward, but only the human knows what the human reward is. Since the AGI has uncertainty it will defer to humans and be corrigible.

Other work includes Clusterability in neural networks: try to measure the modularity of neural networks by thinking of the network as a graph and performing the graph n-cut.

Unanswered canonical questions

Unanswered non-canonical questions