Quantilizers: AI That Doesn't Try Too Hard

Channel: Robert Miles
Published: 2020-12-13T20:46:21Z
How do you get an AI system that does better than a human could, without doing anything a human wouldn't?

A follow-up to "Maximizers and Satisficers": https://youtu.be/Ao4jwLwT36M

The Paper: https://intelligence.org/files/QuantilizersSaferAlternative.pdf
More about this area of research: https://www.alignmentforum.org/tag/mild-optimization

hi so way back in the before time
i made a video about maximizers and
the plan was that was going to be the
first half of a two-parter now i did
script out that second video
and shoot it and even start to edit it
and then
certain events transpired and i never
finished that video so that's what this
part two of a video that i started ages
which i think most people have forgotten
about so i do recommend going back
and watching that video if you haven't
already or even re-watching it to remind
yourself so i'll put a link to that in
the description and with that here's
part two
take it away past me hi in the previous
video we looked at utility maximizers
expected utility maximizers and
using unbounded and bounded utility
functions a powerful utility maximizer
with an unbounded utility function
is a guaranteed apocalypse with a
bounded utility function it's better
in that it's completely indifferent
between doing what we want and disaster
but we can't build that
because it needs perfect prediction of
the future so it's more realistic to
consider an expected utility maximizer
which is a guaranteed apocalypse even
with a bounded utility function
now an expected utility satisficer gets
us back up to indifference between good
outcomes and apocalypses
but it may want to modify itself into a
maximizer and there's nothing to stop it
from doing that the situation
doesn't look great so let's try looking
at something completely different let's
try to get away from this utility
function stuff that seems so dangerous
what if we just tried to directly
imitate humans if we can get enough data
about human behavior
maybe we can train a model that for any
given situation
predicts what a human being would do in
that scenario if the model's good enough
you've basically got a human level agi
it's able to do a wide range of
cognitive tasks just like a human can
because it's just
exactly copying humans that kind of
system won't do a lot of the dangerous
counterproductive things that a
maximizer would do simply because a
human wouldn't do them
but i wouldn't exactly call it safe
because a perfect imitation of a human
isn't safer than the human it's
perfectly imitating and humans aren't
really safe
in principle a truly safe agi could be
given just about any level of power and
and it would tend to produce good
outcomes but the same can't really be
said for humans and an imperfect human
imitation would almost certainly be even
i mean what are the chances that
introducing random errors and
inaccuracies to the imitation
would just happen to make it more safe
rather than less still
it does seem like it would be safer than
a utility maximizer
at least we're out of guaranteed
apocalypse territory but the other thing
that makes this kind of approach
unsatisfactory is
a human imitation can't exceed human
capabilities by much
because it's just copying them a big
part of why we want agi in the first
is to get it to solve problems that we
can't you might be able to run the thing
faster to allow it more thinking time or
something like that but
that's a pretty limited form of super
intelligence and you have to be very
careful with anything along those lines
it means putting the system in a
situation that's very different from
anything any human being has ever
experienced your model might not
generalize well to a situation so
different from anything in its training
which could lead to unpredictable and
potentially dangerous behavior
relatively recently a new approach was
proposed called quantalizing the idea is
that this lets you combine human
imitation and expected utility
to hopefully get some of the advantages
of both without all of the downsides
it works like this you have your human
imitation model given a situation
it can give you a probability
distribution over actions that's like
for each of the possible actions you
could take in this situation
how likely is it that a human would take
that action so in our stamp collecting
example that would be
if a human were trying to collect a lot
of stamps how likely would they be to do
this action
then you have whatever system you'd use
for a utility maximizer
that's able to figure out the expected
utility of different actions
according to some utility function for
any given action it can tell you
how much utility you'd expect to get if
you did that so in our example that's
how many stamps would you expect this
action to result in so for every action
you have these two numbers
the human probability and the expected
utility quantalizing
sort of mixes these together and you get
to choose how they're mixed with a
variable that we'll call
q if q is zero the system acts like an
expected utility maximizer
if it's one the system acts like a human
imitation by setting it somewhere in
we can hopefully get a quantizer that's
more effective than the human imitation
but not as dangerous as the utility
so what exactly is a quantizer let's
look at the definition in the paper
a q quantilyzer is an agent that when
faced with a decision problem
returns a random action in the top q
proportion of some base distribution
over actions
sorted by the expected utility achieved
if that action is executed
so let's break this down and go through
how it works step by step
first we pick a value for q the variable
that determines how we're going to mix
imitation and utility maximization let's
set it to 0.1 for this example
10 now we take all of the available
and sort them by expected utility so on
one end you've got the actions that kick
off all of the crazy extreme utility
maximizing strategies
you know killing everyone and turning
the whole world into stamps all the way
through the moderate strategies like
buying some stamps
and down to all of the strategies that
do nothing and collect no stamps at all
then we look at our base distribution
over actions what is that
in the version i'm talking about we're
using the human imitation system's
probability distribution over actions
for this
so our base distribution is how likely a
human is to do each action
that might look something like this no
human is ever going to try the wacky
extreme maximizing strategies so our
human imitator gives them a probability
of basically zero then there are some
really good strategies that humans
probably won't think of but they might
if they're really smart or lucky
then a big bump of normal strategies
that humans are quite likely to use that
tend to do okay
then tailing off into less and less good
strategies and
eventually stupider and stupider
mistakes the humans are less and less
likely to make
then what we do is we find the point in
our action list
such that 10 of the probability mass is
on the high
expected utility side so that's what q
is really changing it's where we make
this cutoff
note that it's not ten percent of the
actions that would be over here
it's ten percent of the probability mass
then we throw away everything on the
all the stupid and useless choices we
set them to zero and we keep the top ten
now this is no longer a valid
probability distribution because it only
sums up to 0.1
so we multiply all of these by 10 so
that the whole thing sums to 1 again
and that's our final probability
distribution which we sample from to get
our chosen action
so let's look at some different actions
here and see how they do
consider something like misremember your
credit card details
and keep trying to order stamps with the
wrong number and you can't figure out
why it's not working
a human is reasonably likely to do that
not very likely
but we've all met people who point is a
pure human imitation
might do that but the expected utility
maximizer can see that this results in
very few stamps
so it ends up low on the list and
doesn't make the 10 cutoff
so there are lots of mistakes that a
human imitation might make that a
quantalizer won't
and note that for our stamp collecting
utility function the worst case is zero
but you could imagine with other utility
functions a human imitator could make
arbitrarily bad mistakes that a
quantizer would be able to avoid
now the most common boring human
strategies that the human imitator is
very likely to use
also don't make the cut off a 50
quantilizer would have a decent chance
of going with one of them
but a 10 quantizer aims higher than that
the bulk of the probability mass for the
quantilyzer is in strategies that a
human might try
that works significantly better than
average so the quantalizer is kind of
like a human on a really good day
it uses the power of the expected
utility calculation to be more effective
than a pure imitation of a human
is it safe though after all many of the
insane maximizing strategies
are still in our distribution with
hopefully small but still non-zero
and in fact we multiplied them all by 10
when we renormalized if there's some
that a human would go for an extreme
utility maximizing strategy
the 10 percent quantilizer is 10 times
more likely than that
but the probability will still be small
unless you've chosen a very small value
for q
your quantalizer is much more likely to
go for one of the reasonably high
performing human plausible strategies
and what about stability satisficers
tend to want to turn themselves into
maximizes does a quantizer have that
well the human model should give that
kind of strategy a very low probability
a human is extremely unlikely to try to
modify themselves into an expected
utility maximizer to better pursue their
humans can't really self-modify like
that anyway but a human might try to
build an expected utility maximizer
rather than trying to become one that's
kind of worrying since
it's a plan that a human definitely
might try that would result in extremely
high expected utility
so although a quantalizer might seem
like a relatively safe system
it still might end up building an unsafe
one so how's our safety meter looking
well it's progress let's keep working on
some of you may have noticed your
questions in the youtube comments being
answered by a mysterious bot named
stampy the way that works is stampy
cross posts youtube questions
to the rob miles ai discord where me and
a bunch of patrons discuss them and
write replies
oh yeah there's a discord now for
patrons thank you to everyone on the
discord who helps reply to comments
and thank you to all of my patrons all
of these amazing people
in this video i'm especially thanking
timothy lillarcrap
thank you so much for your support and
thank you all for watching
i'll see you next time