RoseMcClelland's Answer to What projects are Redwood Research working on?

From Stampy's Wiki
RoseMcClelland's Answer to What projects are Redwood Research working on?

What is an adversarial oversight scheme?


What work is Redwood doing on LLM interpretability?

Redwood is also doing some work on interpretability tools, though when this was written we did not know of a published a writeup of their interpretability results. As of April, they were focused on getting a complete understanding of nontrivial behaviors of relatively small models. They have released a website for visualizing transformers. Apart from the standard benefits of interpretability, one possibility is that this might be helpful for solving ELK.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Answer to

Answer Info
Original by: RoseMcClelland


Discussion