language models

From Stampy's Wiki
language models
Main Question: How might language models be relevant to AI alignment?
Child tag(s): gpt
Alignment Forum Tag
Wikipedia Page

Description

Language Models are a class of AI trained on text, usually to predict the next word or a word which has been obscured. They have the ability to generate novel prose or code based on an initial prompt, which gives rise to a kind of natural language programming called prompt engineering. The most popular architecture for very large language models is called a transformer, which follows consistent scaling laws with respect to the size of the model being trained, meaning that a larger model trained with the same amount of compute will produce results which are better by a predictable amount (when measured by the 'perplexity', or how surprised the AI is by a test set of human-generated text).

Language Models are a class of AI trained on text, usually to predict the next word or a word which has been obscured. They have the ability to generate novel prose or code based on an initial prompt, which gives rise to a kind of natural language programming called prompt engineering. The most popular architecture for very large language models is called a transformer, which follows consistent scaling laws with respect to the size of the model being trained, meaning that a larger model trained with the same amount of compute will produce results which are better by a predictable amount (when measured by the 'perplexity', or how surprised the AI is by a test set of human-generated text).

See also

  • GPT - A family of large language models created by OpenAI

Canonically answered

Why might people try to build AGI rather than stronger and stronger narrow AIs?

Show your endorsement of this answer by giving it a stamp of approval!

Making a narrow AI for every task would be extremely costly and time-consuming. By making a more general intelligence, you can apply one system to a broader range of tasks, which is economically and strategically attractive.

Of course, for generality to be a good option there are some necessary conditions. You need an architecture which is straightforward enough to scale up, such as the transformer which is used for GPT and follows scaling laws. It's also important that by generalizing you do not lose too much capacity at narrow tasks or require too much extra compute for it to be worthwhile.

Whether or not those conditions actually hold it seems like many important actors (such as DeepMind and OpenAI) believe that they do, and are therefore focusing on trying to build an AGI in order to influence the future, so we should take actions to make it more likely that AGI will be developed safety.

Additionally, it is possible that even if we tried to build only narrow AIs, given enough time and compute we might accidentally create a more general AI than we intend by training a system on a task which requires a broad world model.

See also:

What is GPT-3?

Show your endorsement of this answer by giving it a stamp of approval!

GPT-3 is the newest and most impressive of the GPT (Generative Pretrained Transformer) series of large transformer-based language models created by OpenAI. It was announced in June 2020, and is 100 times larger than its predecessor GPT-2.[1]

Gwern has several resources exploring GPT-3's abilities, limitations, and implications including:

Vox has an article which explains why GPT-3 is a big deal.

  1. GPT-3: What’s it good for? - Cambridge University Press

What are some of the most impressive recent advances in AI capabilities?

Show your endorsement of this answer by giving it a stamp of approval!

GPT-3 showed that transformers are capable of a vast array of natural language tasks, codex/copilot extended this into programming. One demonstrations of GPT-3 is Simulated Elon Musk lives in a simulation. Important to note that there are several much better language models, but they are not publicly available.

DALL-E and DALL-E 2 are among the most visually spectacular.

MuZero, which learned Go, Chess, and many Atari games without any directly coded info about those environments. The graphic there explains it, this seems crucial for being able to do RL in novel environments. We have systems which we can drop into a wide variety of games and they just learn how to play. The same algorithm was used in Tesla's self-driving cars to do complex route finding. These things are general.

Generally capable agents emerge from open-ended play - Diverse procedurally generated environments provide vast amounts of training data for AIs to learn generally applicable skills. Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning shows how these kind of systems can be trained to follow instructions in natural language.

GATO shows you can distill 600+ individually trained tasks into one network, so we're not limited by the tasks being fragmented.

Non-canonical answers

What are OpenAI Codex and GitHub Copilot?

Show your endorsement of this answer by giving it a stamp of approval!

Codex / Github Copilot are AIs that use GPT-3 to write and edit code. When given some input code and comments describing the intended function, they will write output that extends the prompt as accurately as possible.