What is Conjecture's research agenda?
Conjecture is an AI research lab focused on "building Cognitive Emulation - an AI architecture that bounds systems' capabilities and makes them reason in ways that humans can understand and control". Conjecture hopes that this approach will allow for "scalable, auditable, controllable AI systems".
Cognitive Emulation (CoEm)
Conjecture's primary alignment program is "cognitive emulation", i.e., trying to make AI systems that emulate human reasoning.1
A CoEm, in the words of Connor Leahy and Gabriel Alfour:
• Is built on understandable, discoverable and implementable ML
and computational building blocks.Machine learningAn approach to AI in which, instead of designing an algorithm directly, we have the system search through possible algorithms based on how well they do on some training data.
• Does not have so much Magic2
inside of it that we cannot even put bounds on its possible consequences and capabilities."Magic" is a tongue-in-cheek term for computation done by an AI that we don't understand.
• Can be sufficiently understood and bounded to ensure it does not suddenly dramatically shift its behaviors, properties and capabilities.
• Is well situated in the human(ish) capabilities regime and, when in doubt, will default to human-like failure modes rather than completely unpredictable behaviors.
• Is retargetable enough to be deployed to solve many useful problems and not deviate into dangerous behavior, [as long] as it is used by a careful user.
CoEms, by design, would not be able to achieve far-greater-than-human intelligence. However, Conjecture hopes that CoEms could be used to help find formal solutions to the problem of aligning superintelligent AGI
Interpretability
Interpretability
A research area that aims to make machine learning systems easier for humans to understand.
View full definition
for large language models
Large language model
An AI model that takes in some text and predicts how the text is most likely to continue.
View full definition
A research area that aims to make machine learning systems easier for humans to understand.
An AI model that takes in some text and predicts how the text is most likely to continue.
Conjecture has done interpretability research on large language models (LLMs):
-
"Interpreting Neural Networks through the Polytope Lens" proposes that treating "polytopes" (rather than individual neurons or directions) as the fundamental unit of neural nets can reduce the amount of polysemanticity in our understanding of a model.
-
"Circumventing interpretability: How to defeat mind-readers" explores potential methods an unaligned AI might use to make its "thoughts" difficult for us to interpret.
-
"Current themes in mechanistic interpretability
research" is an overview of mechanistic interpretability that includes perspectives from many organizations ("Conjecture, Anthropic, Redwood Research, OpenAI, and DeepMind as well as some independent researchers"). It covers research topics, tools, and practices; field building and research coordination; and theories of impact.Mechanistic interpretabilityView full definitionA subfield of interpretability which involves reverse-engineering the mechanisms by which a model gets from its inputs to its outputs.
-
"Basic Facts about Language Model Internals" lists some patterns that researchers have seen in LLM weights and activations.
Outreach and communication
Conjecture's CEO, Connor Leahy, does public communication about AI safety
A research field about how to prevent risks from advanced artificial intelligence.
This doesn't mean "simulating" human brains, neurons, etc., but rather emulating the logical structure of human thought. In other words, a CoEm would "reason in the same way" as humans, regardless of structural details of how that reasoning is implemented. ↩︎
"Magic" is a tongue-in-cheek term for computation done by an AI that we don't understand. ↩︎