Quantilizers: AI That Doesn't Try Too Hard

From Stampy's Wiki

Quantilizers: AI That Doesn't Try Too Hard
Channel: Robert Miles
Published: 2020-12-13T20:46:21Z
Views: 44016
Likes: 4440
202 questions on this video!
QuestionYouTubeLikesAsked Discord?AnsweredBy
Huntracony's question on Quantilizers122true
Peter Smythe's question on Quantilizers41true
Life Happens's question on Quantilizers39truePlex's Answer to Quantilizers on 2020-12-15T12:08:55 by Life Happens
Aaron Rotenberg's question on Quantilizers38true
Halyo Alex's question on Quantilizers36true
SirMethos's question on Quantilizers17true
Ben Crulis's question on Quantilizers17true
Just nobody's question on Quantilizers16true
Ardent Drops's question on Quantilizers15true
SlimThrull's question on Quantilizers13trueRobert hildebrandt's Answer to Quantilizers on 2020-12-14T03:13:46 by SlimThrull
Daniel Weber's question on Quantilizers8false
Jesse Poulton's question on Quantilizers4true
Uberchops's question on Quantilizers3trueSudonym's Answer to Uberchops's question on Quantilizers
Jo Flo's question on Quantilizers3true
Ariaden's question on Quantilizers2true
QueenDaisy's question on Quantilizers2true
Martin Verrisin's question on Quantilizers2trueSudonym's Answer to Quantilizers on 2020-12-16T19:38:29 by Martin Verrisin
André GN's question on Quantilizers2true
Loligesgame's question on Quantilizers2truePlex's Answer to Quantilizers on 2020-12-13T21:55:08 by loligesgame
Qwerty and Azerty's question on Quantilizers2truePlex's Answer to Quantilizers on 2020-12-13T22:29:50 by Qwerty and Azerty
PianoShow's question on Quantilizers1trueSudonym's Answer to Quantilizers on 2020-12-25T11:41:42 by PianoShow
Mrsuperguy2073's question on Quantilizers1trueRobertskmiles's Answer to Quantilizers on 2020-12-15T18:18:01 by mrsuperguy2073
Testsubject318 No's question on Quantilizers1false
Soken50's question on Quantilizers1false
Nutwit's question on Quantilizers1trueSudonym's Answer to Quantilizers on 2020-12-15T16:27:49 by Nutwit
Ent229's question on Quantilizers1trueSudonym's Answer to Quantilizers on 2020-12-18T23:53:02 by Ent229
JayJaxx's question on Quantilizers1true
SocialDownclimber's question on Quantilizers1truePlex's Answer to Quantilizers on 2020-12-19T13:13:33 by SocialDownclimber
Yezpahr's question on Quantilizers1truePlex's Answer to Quantilizers on 2020-12-19T18:22:43 by Yezpahr
Serenacula's question on Quantilizers1trueGelisam's Answer to Quantilizers on 2020-12-21T07:14:00 by Serenacula
AdibasWakfu's question on Quantilizers1truePlex's Answer to Quantilizers on 2020-12-15T01:44:10 by AdibasWakfu
Timwi Heizmann's question on Quantilizers1false
Matthew Campbell's question on Quantilizers1truePlex's Answer to Quantilizers on 2020-12-14T22:43:20 by Matthew Campbell
Nilshi's question on Quantilizers1true
Richard Collins's question on Quantilizers1trueGelisam's Answer to Quantilizers on 2020-12-13T22:13:57 by Richard Collins
Kierann Temlar's question on Quantilizers1false
Me's question on Quantilizers1true
Henry H's question on Quantilizers1true
Xbzq's question on Quantilizers1true
Isaac's question on Quantilizers1true
Unknown Username's question on Quantilizers1true
Laezar's question on Quantilizers1true
Taras Pylypenko's question on Quantilizers1truePlex's Answer to Quantilizers on 2020-12-14T07:32:09 by Taras Pylypenko
Martin Verrisin's question on Quantilizers1true
Kellen Allen's question on Quantilizers0true
Robert Glass's question on Quantilizers0true
Bastiaan Cnossen's question on Quantilizers0truePlex's Answer to Quantilizers on 2020-12-13T21:53:33 by Bastiaan Cnossen
Boobshart's question on Quantilizers0trueSudonym's Answer to Quantilizers on 2020-12-14T01:34:53 by boobshart
Moleo's question on Quantilizers0trueRobert hildebrandt's Answer to Quantilizers on 2020-12-14T01:27:48 by Moleo
Ricardas Ricardas's question on Quantilizers0truePlex's Answer to Quantilizers on 2020-12-14T00:59:38 by Ricardas Ricardas
... further results

Description

How do you get an AI system that does better than a human could, without doing anything a human wouldn't?

A follow-up to "Maximizers and Satisficers": https://youtu.be/Ao4jwLwT36M

The Paper: https://intelligence.org/files/QuantilizersSaferAlternative.pdf
More about this area of research: https://www.alignmentforum.org/tag/mild-optimization

With thanks to my excellent Patreon supporters:
https://www.patreon.com/robertskmiles

Timothy Lillicrap
Gladamas
James
Scott Worley
Chad Jones
Shevis Johnson
JJ Hepboin
Pedro A Ortega
Said Polat
Chris Canal
Jake Ehrlich
Kellen lask
Francisco Tolmasky
Michael Andregg
David Reid
Peter Rolf
Teague Lasser
Andrew Blackledge
Frank Marsman
Brad Brookshire
Cam MacFarlane
Vivek Nayak
Jason Hise
Phil Moyer
Erik de Bruijn
Alec Johnson
Clemens Arbesser
Ludwig Schubert
Allen Faure
Eric James
Matheson Bayley
Qeith Wreid
jugettje dutchking
Owen Campbell-Moore
Atzin Espino-Murnane
Johnny Vaughan
Jacob Van Buren
Jonatan R
Ingvi Gautsson
Michael Greve
Tom O'Connor
Laura Olds
Jon Halliday
Paul Hobbs
Jeroen De Dauw
Lupuleasa Ionuț
Cooper Lawton
Tim Neilson
Eric Scammell
Igor Keller
Ben Glanton
anul kumar sinha
Duncan Orr
Will Glynn
Tyler Herrmann
Tomas Sayder
Ian Munro
Jérôme Beaulieu
Nathan Fish
Taras Bobrovytsky
Jeremy
Vaskó Richárd
Benjamin Watkin
Sebastian Birjoveanu
Andrew Harcourt
Luc Ritchie
Nicholas Guyett
James Hinchcliffe
12tone
Chris Beacham
Zachary Gidwitz
Nikita Kiriy
Parker
Andrew Schreiber
Steve Trambert
Mario Lois
Abigail Novick
heino hulsey-vincent
Fionn
Dmitri Afanasjev
Marcel Ward
Richárd Nagyfi
Andrew Weir
Kabs
Miłosz Wierzbicki
Tendayi Mawushe
Jannik Olbrich
Jake Fish
Wr4thon
Martin Ottosen
Robert Hildebrandt
Andy Kobre
Poker Chen
Kees
Darko Sperac
Paul Moffat
Robert Valdimarsson
Marco Tiraboschi
Michael Kuhinica
Fraser Cain
Robin Scharf
Klemen Slavic
Patrick Henderson
Oct todo22
Melisa Kostrzewski
Hendrik
Daniel Munter
Alex Knauth
Kasper
Rob Dawson
Ian Reyes
James Fowkes
Tom Sayer
Len
Alan Bandurka
Ben H
Simon Pilkington
Daniel Kokotajlo
Diagon
Andreas Blomqvist
Bertalan Bodor
David Morgan
Zannheim
Daniel Eickhardt
lyon549
HD
Ihor Mukha
14zRobot
Ivan
Jason Cherry
Igor (Kerogi) Kostenko
ib_
Thomas Dingemanse
Stuart Alldritt
Alexander Brown
Devon Bernard
Ted Stokes
James Helms
Jesper Andersson
Jim T
DeepFriedJif
Chris Dinant
Raphaël Lévy
Johannes Walter
Matt Stanton
Garrett Maring
Anthony Chiu
Ghaith Tarawneh
Julian Schulz
Stellated Hexahedron
Caleb
Scott Viteri
Clay Upton
Conor Comiconor
Michael Roeschter
Georg Grass
Isak
Matthias Hölzl
Jim Renney
Edison Franklin
Piers Calderwood
Krzysztof Derecki
Mikhail Tikhomirov
Richard Otto
Matt Brauer
Jaeson Booker
Mateusz Krzaczek
Artem Honcharov
Michael Walters
Tomasz Gliniecki
Mihaly Barasz
Mark Woodward
Ranzear
Neil Palmere
Rajeen Nabid
Christian Epple
Clark Schaefer
Olivier Coutu
Iestyn bleasdale-shepherd
MojoExMachina
Marek Belski
Eric Eldard
Eric Rogstad
Eric Carlson
Caleb Larson
Braden Tisdale
Max Chiswick
Phillip Brandel

https://www.patreon.com/robertskmiles

Transcript

hi so way back in the before time
i made a video about maximizers and
satisfices
the plan was that was going to be the
first half of a two-parter now i did
script out that second video
and shoot it and even start to edit it
and then
certain events transpired and i never
finished that video so that's what this
is
part two of a video that i started ages
ago
which i think most people have forgotten
about so i do recommend going back
and watching that video if you haven't
already or even re-watching it to remind
yourself so i'll put a link to that in
the description and with that here's
part two
take it away past me hi in the previous
video we looked at utility maximizers
expected utility maximizers and
satisfices
using unbounded and bounded utility
functions a powerful utility maximizer
with an unbounded utility function
is a guaranteed apocalypse with a
bounded utility function it's better
in that it's completely indifferent
between doing what we want and disaster
but we can't build that
because it needs perfect prediction of
the future so it's more realistic to
consider an expected utility maximizer
which is a guaranteed apocalypse even
with a bounded utility function
now an expected utility satisficer gets
us back up to indifference between good
outcomes and apocalypses
but it may want to modify itself into a
maximizer and there's nothing to stop it
from doing that the situation
doesn't look great so let's try looking
at something completely different let's
try to get away from this utility
function stuff that seems so dangerous
what if we just tried to directly
imitate humans if we can get enough data
about human behavior
maybe we can train a model that for any
given situation
predicts what a human being would do in
that scenario if the model's good enough
you've basically got a human level agi
right
it's able to do a wide range of
cognitive tasks just like a human can
because it's just
exactly copying humans that kind of
system won't do a lot of the dangerous
counterproductive things that a
maximizer would do simply because a
human wouldn't do them
but i wouldn't exactly call it safe
because a perfect imitation of a human
isn't safer than the human it's
perfectly imitating and humans aren't
really safe
in principle a truly safe agi could be
given just about any level of power and
responsibility
and it would tend to produce good
outcomes but the same can't really be
said for humans and an imperfect human
imitation would almost certainly be even
worse
i mean what are the chances that
introducing random errors and
inaccuracies to the imitation
would just happen to make it more safe
rather than less still
it does seem like it would be safer than
a utility maximizer
at least we're out of guaranteed
apocalypse territory but the other thing
that makes this kind of approach
unsatisfactory is
a human imitation can't exceed human
capabilities by much
because it's just copying them a big
part of why we want agi in the first
place
is to get it to solve problems that we
can't you might be able to run the thing
faster to allow it more thinking time or
something like that but
that's a pretty limited form of super
intelligence and you have to be very
careful with anything along those lines
because
it means putting the system in a
situation that's very different from
anything any human being has ever
experienced your model might not
generalize well to a situation so
different from anything in its training
data
which could lead to unpredictable and
potentially dangerous behavior
relatively recently a new approach was
proposed called quantalizing the idea is
that this lets you combine human
imitation and expected utility
maximization
to hopefully get some of the advantages
of both without all of the downsides
it works like this you have your human
imitation model given a situation
it can give you a probability
distribution over actions that's like
for each of the possible actions you
could take in this situation
how likely is it that a human would take
that action so in our stamp collecting
example that would be
if a human were trying to collect a lot
of stamps how likely would they be to do
this action
then you have whatever system you'd use
for a utility maximizer
that's able to figure out the expected
utility of different actions
according to some utility function for
any given action it can tell you
how much utility you'd expect to get if
you did that so in our example that's
how many stamps would you expect this
action to result in so for every action
you have these two numbers
the human probability and the expected
utility quantalizing
sort of mixes these together and you get
to choose how they're mixed with a
variable that we'll call
q if q is zero the system acts like an
expected utility maximizer
if it's one the system acts like a human
imitation by setting it somewhere in
between
we can hopefully get a quantizer that's
more effective than the human imitation
but not as dangerous as the utility
maximizer
so what exactly is a quantizer let's
look at the definition in the paper
a q quantilyzer is an agent that when
faced with a decision problem
returns a random action in the top q
proportion of some base distribution
over actions
sorted by the expected utility achieved
if that action is executed
so let's break this down and go through
how it works step by step
first we pick a value for q the variable
that determines how we're going to mix
imitation and utility maximization let's
set it to 0.1 for this example
10 now we take all of the available
actions
and sort them by expected utility so on
one end you've got the actions that kick
off all of the crazy extreme utility
maximizing strategies
you know killing everyone and turning
the whole world into stamps all the way
down
through the moderate strategies like
buying some stamps
and down to all of the strategies that
do nothing and collect no stamps at all
then we look at our base distribution
over actions what is that
in the version i'm talking about we're
using the human imitation system's
probability distribution over actions
for this
so our base distribution is how likely a
human is to do each action
that might look something like this no
human is ever going to try the wacky
extreme maximizing strategies so our
human imitator gives them a probability
of basically zero then there are some
really good strategies that humans
probably won't think of but they might
if they're really smart or lucky
then a big bump of normal strategies
that humans are quite likely to use that
tend to do okay
then tailing off into less and less good
strategies and
eventually stupider and stupider
mistakes the humans are less and less
likely to make
then what we do is we find the point in
our action list
such that 10 of the probability mass is
on the high
expected utility side so that's what q
is really changing it's where we make
this cutoff
note that it's not ten percent of the
actions that would be over here
it's ten percent of the probability mass
then we throw away everything on the
right
all the stupid and useless choices we
set them to zero and we keep the top ten
percent
now this is no longer a valid
probability distribution because it only
sums up to 0.1
so we multiply all of these by 10 so
that the whole thing sums to 1 again
and that's our final probability
distribution which we sample from to get
our chosen action
so let's look at some different actions
here and see how they do
consider something like misremember your
credit card details
and keep trying to order stamps with the
wrong number and you can't figure out
why it's not working
a human is reasonably likely to do that
not very likely
but we've all met people who point is a
pure human imitation
might do that but the expected utility
maximizer can see that this results in
very few stamps
so it ends up low on the list and
doesn't make the 10 cutoff
so there are lots of mistakes that a
human imitation might make that a
quantalizer won't
and note that for our stamp collecting
utility function the worst case is zero
stamps
but you could imagine with other utility
functions a human imitator could make
arbitrarily bad mistakes that a
quantizer would be able to avoid
now the most common boring human
strategies that the human imitator is
very likely to use
also don't make the cut off a 50
quantilizer would have a decent chance
of going with one of them
but a 10 quantizer aims higher than that
the bulk of the probability mass for the
10
quantilyzer is in strategies that a
human might try
that works significantly better than
average so the quantalizer is kind of
like a human on a really good day
it uses the power of the expected
utility calculation to be more effective
than a pure imitation of a human
is it safe though after all many of the
insane maximizing strategies
are still in our distribution with
hopefully small but still non-zero
probabilities
and in fact we multiplied them all by 10
when we renormalized if there's some
chance
that a human would go for an extreme
utility maximizing strategy
the 10 percent quantilizer is 10 times
more likely than that
but the probability will still be small
unless you've chosen a very small value
for q
your quantalizer is much more likely to
go for one of the reasonably high
performing human plausible strategies
and what about stability satisficers
tend to want to turn themselves into
maximizes does a quantizer have that
problem
well the human model should give that
kind of strategy a very low probability
a human is extremely unlikely to try to
modify themselves into an expected
utility maximizer to better pursue their
goals
humans can't really self-modify like
that anyway but a human might try to
build an expected utility maximizer
rather than trying to become one that's
kind of worrying since
it's a plan that a human definitely
might try that would result in extremely
high expected utility
so although a quantalizer might seem
like a relatively safe system
it still might end up building an unsafe
one so how's our safety meter looking
well it's progress let's keep working on
it
some of you may have noticed your
questions in the youtube comments being
answered by a mysterious bot named
stampy the way that works is stampy
cross posts youtube questions
to the rob miles ai discord where me and
a bunch of patrons discuss them and
write replies
oh yeah there's a discord now for
patrons thank you to everyone on the
discord who helps reply to comments
and thank you to all of my patrons all
of these amazing people
in this video i'm especially thanking
timothy lillarcrap
thank you so much for your support and
thank you all for watching
i'll see you next time
you