What are some AI governance exercises and projects I can try?

4 min read

Suggest changes in Google Docs

This list is largely focused on projects within AI policy rather than other career paths like AI safety

technical research. See here for projects focused on technical research (there is some overlap).

[Public] Some AI Governance Research Ideas (from GovAI)
Project page from AGI

AGI

AI that is at least as good as humans at general problem-solving.

View full definition

Safety Fundamentals and their Open List of Project Ideas
2023 Open Philanthropy AI Worldviews Contest
AI Safety Ideas by Apart Research; EAF post
Most Important Century writing prize (Superlinear page)
Center for AI Safety
- Competitions like SafeBench (see example ideas)
- Student ML Safety Research Stipend Opportunity – provides stipends for doing ML research.
- course.mlsafety.org projects — CAIS is looking for someone to add details about these projects on course.mlsafety.org
Distilling / summarizing / synthesizing / reviewing / explaining
Forming your own views on AI safety (without stress!) – also see Neel Nanda's presentation slides and "Inside Views Resources" document
"Mostly focused on AI" section of "A central directory for open research questions" – contains a list of links to projects, similar to this document
Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision"
Answer some of the application questions from the winter 2022 SERI-MATS application process, such as Vivek Hebbar's problems
10 exercises from Akash in “Resources that (I think) new alignment researchers should know about”
[T] Deception Demo Brainstorm has some ideas (message Thomas Larsen if these seem interesting)
Upcoming 2023 Open Philanthropy AI Worldviews Contest
Alignment research at ALTER – interesting research problems, many have a theoretical mathematics flavor
Open Problems in AI X-Risk

Existential risk

A risk of human extinction or the destruction of humanity’s long-term potential.

View full definition

[PAIS #5]
Steven Byrnes: [Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Evan Hubinger: Concrete experiments in inner alignment

Inner misalignment

When an AI system ends up pursuing a different objective than the one that was specified.

View full definition

, ideas someone should investigate further, sticky goals
Richard Ngo: Some conceptual alignment research projects, alignment research exercises
Buck Shlegeris: Some fun ML engineering projects that I would think are cool, The case for becoming a black box investigator of language models
Implement a key paper in deep reinforcement learning
Amplify creative grants (old)
“Paper replication resources” section in “How to pursue a career in technical alignment”
ELK – How can we train a model to report its latent knowledge of off-screen events?
Daniel Filan idea – studying competent misgeneralization without reference to a goal
Summarize a reading from Reading What We Can
Zac Hatfield-Dodds: “The list I wrote up for 2021 final-year-undergrad projects is at https://zhd.dev/phd/student-ideas.html - note that these are aimed at software engineering rather than ML, NLP, or AI Safety per se (most of those ideas I have stay at Anthropic, and are probably infeasible for student projects).” These projects are good for AI safety engineering careers.

What are some exercises and projects I can try?

AI governance

What is everyone working on in AI governance?