What are some exercises and projects I can try?
This document is somewhat organized based on projects focusing on AI safety technical research and projects focusing on AI policy.
Consider joining some online AI safety communities (see AISafety.com/communities) and asking for feedback or ideas.
Technical AI safety
-
Levelling Up in AI Safety Research Engineering [Public] (LW)
- Highly recommended list of AI safety research engineering resources for people at various skill levels.
-
Research directions from OpenAI’s former Superalignment Fast Grants
-
Alignment Jams / hackathons from Apart Research
-
Some past / upcoming hackathons: LLM, interpretability 1, AI test, interpretability 2, oversight, governance
-
Resources: black-box investigator of language models, interpretability playground (LW), AI test, oversight, governance
-
Projects on AI Safety Ideas: LLM, interpretability, AI test,
-
How to run one as an in-person event at your school
-
-
200 Concrete Open Problems in Mechanistic Interpretability by Neel Nanda
-
-
Student ML Safety Research Stipend Opportunity – provides stipends for doing ML research.
-
Projects week from the alignment track of AI Safety Fundamentals
-
"Technical/theoretical AI safety/alignment" section of "A central directory for open research questions" – contains a list of links to projects, similar to this document
-
Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision"
-
AI Alignment Awards, a contest that concluded in May of 2023
-
Answer some of the application questions from the winter 2022 SERI-MATS, such as Vivek Hebbar's problems
-
[T] Deception Demo Brainstorm has some ideas (message Thomas Larsen if these seem interesting)
-
Alignment research at ALTER – interesting research problems, many have a theoretical math flavor
-
Steven Byrnes: [Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
-
Evan Hubinger: Concrete experiments in inner alignment, ideas someone should investigate further, sticky goals
-
Richard Ngo: Some conceptual alignment research projects, alignment research exercises
-
Buck Shlegeris: Some fun ML engineering projects that I would think are cool, The case for becoming a black box investigator of language models
-
Implement a key paper in deep reinforcement learning
-
“Paper replication resources” section in “How to pursue a career in technical alignment”
-
Zac Hatfield-Dodds: “The list I wrote up for 2021 final-year-undergrad projects is at https://zhd.dev/phd/student-ideas.html — note that these are aimed at software engineering rather than ML, NLP, or AI Safety per se (most of those ideas I have stay at Anthropic, and are probably infeasible for student projects).” These projects are good for AI safety engineering careers.
-
Daniel Filan’s idea about studying competent misgeneralization
-
Owain Evans + Stuart Armstrong: “AI Safety Research Project Ideas”
-
Singular Learning Theory Exercises — A researcher at Timaeus created this list of exercises for the Distilling Singular Learning Theory Sequence.
-
CAIS has some project ideas for demonstrating techniques in ML safety.
AI policy/strategy/governance
-
[Public] Some AI Governance Research Ideas (from GovAI)
-
Compute Research Questions and Metrics - Transformative AI and Compute [4/4]
-
Week 9: Projects for the AI Safety Fundamentals course on AI governance
-
The Alignment Jams / hackathons from Apart Research sometimes focus on AI governance
-
"AI policy/strategy/governance" section of "A central directory for open research questions" – contains a list of links to projects, similar to this document
Both technical research and AI governance
-
AI Safety Ideas by Apart Research; EA Forum post
-
Distilling / summarizing / synthesizing / reviewing / explaining
-
Forming your own views on AI safety (without stress!) — also see Neel's presentation slides and "Inside Views Resources" doc
-
10 exercises from Akash in “Resources that (I think) new alignment researchers should know about”
-
Important, actionable research questions for the most important century (Holden Karnofsky)
-
Amplify creative grants (old)
-
Summarize a reading from Reading What We Can