What is Conjecture's research agenda?

4 min read

Suggest changes in Google Docs

Conjecture is an AI research lab focused on "building Cognitive Emulation - an AI architecture that bounds systems' capabilities and makes them reason in ways that humans can understand and control". Conjecture hopes that this approach will allow for "scalable, auditable, controllable AI systems".

Cognitive Emulation (CoEm)

Conjecture's primary alignment program is "cognitive emulation", i.e., trying to make AI systems that emulate human reasoning.¹ Conjecture calls such systems "CoEms". The most powerful current AI systems generally exhibit opaque and potentially very "alien" reasoning. By contrast, a successful CoEm would be "good at chess for the same reasons humans are good at chess".

A CoEm, in the words of Connor Leahy and Gabriel Alfour:

• Is built on understandable, discoverable and implementable ML and computational building blocks.

• Does not have so much Magic² inside of it that we cannot even put bounds on its possible consequences and capabilities.

• Can be sufficiently understood and bounded to ensure it does not suddenly dramatically shift its behaviors, properties and capabilities.

• Is well situated in the human(ish) capabilities regime and, when in doubt, will default to human-like failure modes rather than completely unpredictable behaviors.

• Is retargetable enough to be deployed to solve many useful problems and not deviate into dangerous behavior, [as long] as it is used by a careful user.

CoEms, by design, would not be able to achieve far-greater-than-human intelligence. However, Conjecture hopes that CoEms could be used to help find formal solutions to the problem of aligning superintelligent AGI.

Interpretability for large language models

Conjecture has done interpretability research on large language models (LLMs):

"Interpreting Neural Networks through the Polytope Lens" proposes that treating "polytopes" (rather than individual neurons or directions) as the fundamental unit of neural nets can reduce the amount of polysemanticity in our understanding of a model.
"Circumventing interpretability: How to defeat mind-readers" explores potential methods an unaligned AI might use to make its "thoughts" difficult for us to interpret.
"Current themes in mechanistic interpretability research" is an overview of mechanistic interpretability that includes perspectives from many organizations ("Conjecture, Anthropic, Redwood Research, OpenAI, and DeepMind as well as some independent researchers"). It covers research topics, tools, and practices; field building and research coordination; and theories of impact.
"Basic Facts about Language Model Internals" lists some patterns that researchers have seen in LLM weights and activations.

Outreach and communication

Conjecture's CEO, Connor Leahy, does public communication about AI safety and alignment, including appearances on podcasts and news programs.

This doesn't mean "simulating" human brains, neurons, etc., but rather emulating the logical structure of human thought. In other words, a CoEm would "reason in the same way" as humans, regardless of structural details of how that reasoning is implemented. ↩︎
"Magic" is a tongue-in-cheek term for computation done by an AI that we don't understand. ↩︎