What is the Center for AI Safety (CAIS)'s research agenda?
The Center for AI Safety A research field about how to prevent risks from advanced artificial intelligence.
Their technical research focuses on improving the safety of existing AI systems, and often involves building benchmarks and testing models against those benchmarks. It includes work on:
-
Robustness, for example their analysis of distribution shift, their evaluation of LLM
rule-following, and their proposed data processing method for improving robustness.Large language modelView full definitionAn AI model that takes in some text and predicts how the text is most likely to continue.
-
Transparency, where they have presented representation engineering (RepE) as an emerging approach to transparency.
-
Machine ethics, where their most well-known work includes the ETHICS dataset and MACHIAVELLI benchmark for evaluating language models.
-
Anomaly detection, where they have worked on establishing a baseline for detection of out-of-distribution examples, and have proposed outlier exposure (OE), a method of detecting anomalies by training a detector based on a dataset of anomalies.
Their conceptual research has included:
-
Surveys of the field: “Unsolved Problems in ML Safety” (2022), “X-Risk
Analysis for AI Research” (2022), “An Overview of Catastrophic AI Risks” (2023), and “AI Deception: A Survey of Examples, Risks, and Potential Solutions” (2023)Existential riskView full definitionA risk of human extinction or the destruction of humanity’s long-term potential.
Their field-building projects include:
-
The May 2023 Statement on AI Risk – a statement signed by many AI scientists and other notable figures
-
The CAIS Compute
Cluster, which offers compute for AI safety researchComputeView full definitionShorthand for “computing power”. It may refer to, for instance, physical infrastructure such as CPUs or GPUs that perform processing, or the amount of processing power needed to train a model.
-
Prize incentives for safety-relevant research such as improving ML safety benchmarks, moral uncertainty detection by ML systems, and forecasting by ML systems
-
An ML Safety course and scholarships for ML students doing safety-related research
Not to be confused with Comprehensive AI Services, a conceptual model of artificial general intelligence proposed by Eric Drexler, also abbreviated CAIS. ↩︎