AI That Doesn't Try Too Hard - Maximizers and Satisficers

From Stampy's Wiki

AI That Doesn't Try Too Hard - Maximizers and Satisficers
Channel: Robert Miles
Published: 2019-08-23T15:05:26Z
Views: 145589
Likes: 8090
203 questions on this video!
QuestionYouTubeLikesAsked Discord?AnsweredBy
Martin Verrisin's question on Maximizers and Satisficers327falseAbram Demski's Answer to Maximizers and Satisficers on 2019-08-23T16:05:49 by Martin Verrisin
Cobra6x6's question on Maximizers and Satisficers186false
Charles Miller's question on Maximizers and Satisficers13false
BinaryReader's question on Maximizers and Satisficers8false
Grischa's question on Maximizers and Satisficers7true
MattettaM's question on Maximizers and Satisficers4falseMorpheus's Answer to MattettaM's question on Maximizers and Satisficers
Albert Perrien's question on Maximizers and Satisficers4true
James Clerk Maxwell's question on Maximizers and Satisficers3true
Tobias Görgen's question on Maximizers and Satisficers3true
Brandon Frazier's question on Maximizers and Satisficers2false
Jimmy Lay's question on Maximizers and Satisficers2true
Douglas Jackson's question on Maximizers and Satisficers2false
Trophonix's question on Maximizers and Satisficers2false
Zestyorangez's question on Maximizers and Satisficers1false
Florian Matel's question on Maximizers and Satisficers1false
Håkon Egset Harnes's question on Maximizers and Satisficers1false
Noel Walters's question on Maximizers and Satisficers1false
Qed Soku's question on Maximizers and Satisficers1false
Audiodevel.com's question on Maximizers and Satisficers1false
Barry Mitchell's question on Maximizers and Satisficers1false
Ryan Roberson's question on Maximizers and Satisficers1false
Dragon Curve Enthusiast's question on Maximizers and Satisficers1false
Keith Barrett's question on Maximizers and Satisficers1false
Varstel's question on Maximizers and Satisficers1false
Firebrain's question on Maximizers and Satisficers1false
Luke Fabis's question on Maximizers and Satisficers1false
Thordan Ssoa's question on Maximizers and Satisficers1false
Nethan Garvey's question on Maximizers and Satisficers1false
YtterbiJum's question on Maximizers and Satisficers1false
Yo milo's question on Maximizers and Satisficers1true
TheEarphoneguy's question on Maximizers and Satisficers1true
Siang Cheng Pang's question on Maximizers and Satisficers1true
Aednil's question on Maximizers and Satisficers1false
Teh Squirrel's question on Maximizers and Satisficers1false
Robot 1g5's question on Maximizers and Satisficers1false
Math321's question on Maximizers and Satisficers1true
Bloody albatross's question on Maximizers and Satisficers1false
DrTryloByte's question on Maximizers and Satisficers1false
Greniza *'s question on Maximizers and Satisficers1false
Cubedude76's question on Maximizers and Satisficers1false
Minion Ninja's question on Maximizers and Satisficers1true
Zebobez's question on Maximizers and Satisficers1false
Grizzley42's question on Maximizers and Satisficers1false
Government Official's question on Maximizers and Satisficers0false
GAPIntoTheGame's question on Maximizers and Satisficers0false
Matrixar's music workshop's question on Maximizers and Satisficers0false
That Scar's question on Maximizers and Satisficers0false
Tera Star's question on Maximizers and Satisficers0false
Rew Rose's question on Maximizers and Satisficers0false
Guy Numbers's question on Maximizers and Satisficers0false
... further results

Description

Powerful AI systems can be dangerous in part because they pursue their goals as strongly as they can. Perhaps it would be safer to have systems that don't aim for perfection, and stop at 'good enough'. How could we build something like that?

Generating Fake YouTube comments with GPT-2: https://youtu.be/M6EXmoP5jX8

Computerphile Videos:
Unicorn AI: https://youtu.be/89A4jGvaaKk
More GPT-2, the 'writer' of Unicorn AI: https://youtu.be/p-6F4rhRYLQ
AI Language Models & Transformers: https://youtu.be/rURRYI66E54
GPT-2: Why Didn't They Release It?: https://youtu.be/AJxLtdur5fc
The Deadly Truth of General AI?: https://youtu.be/tcdVC4e6EV4


With thanks to my excellent Patreon supporters:
https://www.patreon.com/robertskmiles

Scott Worley
Jordan Medina
Simon Strandgaard
JJ Hepboin
Lupuleasa Ionuț
Pedro A Ortega
Said Polat
Chris Canal
Nicholas Kees Dupuis
Jake Ehrlich
Mark Hechim
Kellen lask
Francisco Tolmasky
Michael Andregg
Alexandru Dobre
David Reid
Robert Daniel Pickard
Peter Rolf
Chad Jones
Truthdoc
James
Richárd Nagyfi
Jason Hise
Phil Moyer
Shevis Johnson
Alec Johnson
Clemens Arbesser
Ludwig Schubert
Bryce Daifuku
Allen Faure
Eric James
Jonatan R
Ingvi Gautsson
Michael Greve
Julius Brash
Tom O'Connor
Erik de Bruijn
Robin Green
Laura Olds
Jon Halliday
Paul Hobbs
Jeroen De Dauw
Tim Neilson
Eric Scammell
Igor Keller
Ben Glanton
Robert Sokolowski
anul kumar sinha
Jérôme Frossard
Sean Gibat
Cooper Lawton
Tyler Herrmann
Tomas Sayder
Ian Munro
Jérôme Beaulieu
Taras Bobrovytsky
Anne Buit
Tom Murphy
Vaskó Richárd
Sebastian Birjoveanu
Gladamas
Sylvain Chevalier
DGJono
Dmitri Afanasjev
Brian Sandberg
Marcel Ward
Andrew Weir
Ben Archer
Scott McCarthy
Kabs
Miłosz Wierzbicki
Tendayi Mawushe
Jannik Olbrich
Anne Kohlbrenner
Jussi Männistö
Mr Fantastic
Wr4thon
Martin Ottosen
Archy de Berker
Marc Pauly
Joshua Pratt
Andy Kobre
Brian Gillespie
Martin Wind
Peggy Youell
Poker Chen
Kees
Darko Sperac
Truls
Paul Moffat
Anders Öhrt
Marco Tiraboschi
Michael Kuhinica
Fraser Cain
Robin Scharf
Oren Milman
John Rees
Seth Brothwell
Clark Mitchell
Kasper Schnack
Michael Hunter
Klemen Slavic
Patrick Henderson
Long Nguyen
Melisa Kostrzewski
Hendrik
Daniel Munter
Graham Henry
Volotat
Duncan Orr
Marin Aldimirov
Bryan Egan
James Fowkes
Frame Problems
Alan Bandurka
Benjamin Hull
Tatiana Ponomareva
Aleksi Maunu
Michael Bates
Simon Pilkington
Dion Gerald Bridger
Steven Cope
Marcos Alfredo Núñez
Petr Smital
Daniel Kokotajlo
Fionn
Yuchong Li
Nathan Fish
Diagon
Parker Lund
Russell schoen
Andreas Blomqvist
Bertalan Bodor
David Morgan
Ben Schultz
Zannheim
Daniel Eickhardt
lyon549
HD

https://www.patreon.com/robertskmiles

Transcript

hi so way back when I started this
online air safety videos thing on
computer file I was talking about how
you have a problem when you maximize
just about any simple utility function
the example I used was an AI system
meant to collect a lot of stamps which
works like this the system is connected
to the Internet and for all sequences of
packets it could send it simulates
exactly how many stamps would end up
being collected after one year if it
sent those packets it then selects the
sequence with the most stamps and sense
that this is what's called a utility
Maximizer and it seems like any utility
function you give this kind of system as
a goal it does it to the max utility
maximizers
tend to take extreme actions they're
happy to destroy the whole world just to
get a tiny increase in the output of
their utility functions so unless the
utility function lines up exactly with
human values their actions are pretty
much guaranteed to be disastrous
intuitively the issue is that utility
maximizers have precisely zero chill to
anthropomorphize horribly they seem to
have a frantic obsessive maniacal
attitude we find ourselves wanting to
say look could you just dial it back a
little can you just relax just a bit so
suppose we want a lot of stamps but like
not that many it must be possible to
design a system that just collects a
bunch of stamps and then stops right how
can we do that well the first obvious
issue with the existing design is that
the utility function is unbounded the
more stamps the better with no limit
however many stamps it has it can get
more utility by getting one more stamp
any world where humans are alive and
happy is a world that could have more
stamps in it so the maximum of this
utility function is the end of the world
let's say we only really want a hundred
stamps so what if we make a bounded
utility function that returns whichever
is smaller the number of stamps or 100
getting a hundred stamps from ebay gives
100 utility converting the whole world
into stamps also gives 100 utility this
function is totally indifferent between
all outcomes that contain more than a
hundred stamps so what does a Maximizer
of this utility function actually do now
the system's behavior is no longer
really specified it will do one of the
things that results in a hundred utility
which includes a bunch of perfectly
reasonable behaviors that the programmer
would be happy with
and a bunch of apocalypse is and a bunch
of outcomes somewhere in between
if you select at random from all courses
of action that result in at least 100
stamps what proportion of those are
actually acceptable outcomes for humans
I don't know probably not enough this is
still a step up though because the
previous utility function was guaranteed
to kill everyone and this new one has at
least some probability of doing the
right thing but actually of course this
utility Maximizer concept is too
unrealistic even in the realm of talking
about hypothetical agents in the
abstract in the field experiment our
stamp collector system is able to know
with certainty exactly how many stamps
any particular course of action will
result in but you just can't simulate
the world that accurately it's more than
just computationally intractable it's
probably not even allowed by physics
pure utility maximization is only
available for very simple problems where
everything is deterministic and fully
known if there's any uncertainty you
have to do expected utility maximizing
this is pretty straightforwardly how
you'd expect to apply uncertainty to
this situation the expected utility of a
choice is the utility you'd expect to
get from it on average so like suppose
there's a button that flips a coin and
if its tail's you get 50 stamps and if
it's heads you get 150 stamps in
expectation this results in a hundred
stamps right it never actually returns
100 but on average that's what you get
that's the expected number of stamps to
get the expected utility you just apply
your utility function to each of the
outcomes before you do the rest of the
calculation so if your utility function
is just how many stamps do I get then
the expected utility of the button is
100 but if your utility function is
capped at a hundred for example then the
outcome of winning one hundred and fifty
stamps is now only worth a hundred
utility so the expected utility of the
button is only 75 now suppose there were
a second button that gives either eighty
or ninety stamps again with 50/50
probability this gives 85 stamps in
expectation and since none of the
outcomes are more than 100 both of the
functions value this button at 85
utility so this means the agent with the
unbounded utility function would prefer
the first button with its expected 100
stamps but the agent with the bounded
utility function would prefer the second
button since its expected utility of 85
is higher than the
buttons expected utility of 75 this
makes the bounded utility function feel
a little safer in this case it actually
makes the agent prefer the option that
results in fewer stamps because it just
doesn't care about any stamps past 100
in the same way let's consider some
risky extreme stamp collecting plan this
plan is pretty likely to fail and in
that case the agent might be destroyed
and get no stamps but if the plan
succeeds the agent could take over the
world and get a trillion stamps an agent
with an unbounded utility function would
rate this plan pretty highly the huge
utility of taking over the world makes
the risk worth it but the agent with the
bounded utility function doesn't prefer
a trillion stamps to a hundred stamps
it only gets 100 utility either way so
it would much prefer a conservative
strategy that just gets a hundred stamps
with high confidence but how does this
kind of system behave in the real world
where you never really know anything
with absolute certainty the pure utility
Maximizer that effectively knows the
future can order a hundred stamps and
know that it will get 100 stamps but the
expected utility maximize it doesn't
know for sure the seller might be lying
the package might get lost and so on so
if the expected utility of ordering a
hundred stamps is a bit less than 100 if
there's a 1% chance that something goes
wrong and we get 0 stamps then our
expected utility is only 99 that's below
the limit of 100 so we can improve that
by ordering some extras to be on the
safe side maybe we order another 100 now
our expected utility is 99.99 still not
a hundred so we should order some more
just in case now we're at 99.9999 the
expected value of a utility function
that's bounded at 100 can never actually
hit 100 you can always become slightly
more certain that you've got at least
100 stamps better turn the whole world
into stamps because hey you never know
so an expected utility Maximizer with a
bounded utility function ends up pretty
much as dangerous as one with an
unbounded utility function ok what if we
try to limit it from both sides like you
get a hundred utility if you have a
hundred stamps and zero otherwise now
it's not going to collect a trillion
stamps just to be sure it will collect
exactly 100 stamps but it's still
incentivized to take extreme actions to
be sure that it really does have a
hundred like turning the whole world
into elaborate highly
stamp counting and recounting machinery
getting slightly more utility every time
it checks again it seems like whatever
we try to maximize it causes problems so
maybe we could try not maximizing maybe
we could try what's called satisficing
rather than trying to get our utility
function to return as higher value as
possible and expectation what if we set
a threshold and accept any strategy that
passes that threshold in the case of the
stamp collector that would look like
look through possible ways you could
send out packets calculate how many
stamps you'd expect to collect on
average with each strategy and as soon
as you hit one that you expect to get at
least 100 stamps just go with that one
this satisficer seems to get us to about
where we were with the pure utility
Maximizer with a bounded utility
function it's not clear exactly what it
will do except that it will do one of
the things that results in more than a
hundred stamps in expectation which
again includes a lot of sensible
behaviors and a lot of apocalypses and a
lot of things somewhere in between since
the system implements the first
satisfactory strategy it finds the
specific behavior depends on the order
in which it considers the options what
automated use well one obvious approach
is to go with the simplest or shortest
plans first after all any plan that
takes over the world probably requires
much more complexity than just ordering
some stamps on eBay but consider the
following plan get into your own source
code and change yourself from a
satisficer into a Maximizer all you're
doing there is changing a few lines of
code on your own system so this is a
pretty simple plan that's likely to be
considered fairly early on it might not
be simpler than just ordering some
stamps but that's not much reassurance
the more challenging the task we give
our AGI the more likely it is that it
will hit on this kind of self
modification strategy before any
legitimate ones and the plan certainly
satisfies the search criteria if you
change yourself into a Maximizer that
Maximizer will predictably find and
implement some plan that results in a
lot of stamps so you can tell that the
expected stamp output of the become a
Maximizer plan is satisfactorily high
even without knowing what plan the
Maximizer will actually implement so
satisficers kind of want to become
maximizes which means that being a
satisficer is unstable as a safety
feature it tends to uninstall itself so
to recap a powerful utility maximized
with an unbounded utility function is a
guaranteed apocalypse with a bounded
utility function it's better in that
it's completely indifferent between
doing what we want and disaster but we
can't build that because it needs
perfect prediction of the future so it's
more realistic to consider an expected
utility Maximizer which is a guaranteed
apocalypse even with a bounded utility
function now an expected utility
satisficer
gets us back up to in difference between
good outcomes and apocalypses but it may
want to modify itself into a Maximizer
and there's nothing to stop it from
doing that so currently things aren't
looking great but we're not done people
have thought of more approaches and
we'll talk about some of those in the
next video
I want to end the video with a big thank
you to all of my wonderful Patriots
that's all of these great people right
here in this video I'm especially
thanking Simon strand card thank you so
much you know thanks to your support I
was able to buy this boat for this I
bought a green screen actually but I
like it because it lets me make videos
like this one that I put up on my second
channel where I used GPT to to generate
a bunch of fake YouTube comments and
read them that video ties in with three
other videos I made with computer file
talking about the ethics of releasing AI
systems that might have malicious uses
so you can check all of those out
there's links in the description thank
you again to my patrons and thank you
all for watching I'll see you next time