What is FAR's theory of change?

From Stampy's Wiki
What is FAR's theory of change?
Mark as:

Tags: None (add tags)

Canonical Answer

FAR's theory of change is to incubate new, scalable alignment research agendas. Right now there are a small range of agendas being pursued at scale (largely Reinforcement Learning from Human Feedback and interpretability), then a long tail of very diverse agendas being pursued by single individuals (mostly independent researchers or graduate students) or 2-3 person teams. I believe there's a lot of valuable ideas in this long tail that could be scaled, but this isn't happening due to a lack of institutional support. It makes sense that the major organisations want to focus on their own specific agendas -- there's a benefit to being focused! -- but it means a lot of valuable agendas are slipping through the cracks.

FAR's current approach to solving this problem is to build out a technical team (research engineers, junior research scientists, technical communication specialists) and provide support to a broad range of agendas pioneered by external research leads. Those that work, FAR will double down on and invest more in. This model has had a fair amount of demand already so there's product-market fit, but we still want to iterate and see if we can improve the model. For example, long-term FAR might want to bring some or all research leads in-house.

In terms of concrete agendas, an example of some of the things FAR is working on:

  • Adversarial attacks against narrowly superhuman systems like AlphaGo.
  • Language model benchmarks for value learning.
  • The inverse scaling law prize.

You can read more about FAR in their post.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: far (create tag) (edit tags)

Non-Canonical Answers

FAR's theory of change is to incubate new, scalable alignment research agendas. Right now there are a small range of agendas being pursued at scale (largely Reinforcement Learning from Human Feedback and interpretability), then a long tail of very diverse agendas being pursued by single individuals (mostly independent researchers or graduate students) or 2-3 person teams. I believe there's a lot of valuable ideas in this long tail that could be scaled, but this isn't happening due to a lack of institutional support. It makes sense that the major organisations want to focus on their own specific agendas -- there's a benefit to being focused! -- but it means a lot of valuable agendas are slipping through the cracks.

FAR's current approach to solving this problem is to build out a technical team (research engineers, junior research scientists, technical communication specialists) and provide support to a broad range of agendas pioneered by external research leads. Those that work, FAR will double down on and invest more in. This model has had a fair amount of demand already so there's product-market fit, but we still want to iterate and see if we can improve the model. For example, long-term FAR might want to bring some or all research leads in-house.

In terms of concrete agendas, an example of some of the things FAR is working on:

  • Adversarial attacks against narrowly superhuman systems like AlphaGo.
  • Language model benchmarks for value learning.
  • The inverse scaling law prize.

You can read more about FAR in their post.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: far (create tag) (edit tags)

Question Info
Asked by: RoseMcClelland
OriginWhere was this question originally asked
Wiki
Date: 2022/09/13


Discussion