Safe Exploration: Concrete Problems in AI Safety Part 6

From Stampy's Wiki

Safe Exploration: Concrete Problems in AI Safety Part 6
Channel: Robert Miles
Published: 2018-09-21T11:20:53Z
Views: 65707
Likes: 3312
48 questions on this video!
QuestionYouTubeLikesAsked Discord?AnsweredBy
Silvercomic's question on Safe Exploration14false
Ethan Alfonso's question on Safe Exploration5false
Creatotron's question on Safe Exploration2false
Niels Dewitte's question on Safe Exploration1false
JASS Cat's question on Safe Exploration1false
Julian Danzer's question on Safe Exploration1false
Jonas Thörnvall's question on Safe Exploration1false
RiBo's question on Safe Exploration1false
VladVladislav's question on Safe Exploration1false
Q D's question on Safe Exploration0false
Jonas Thörnvall's question on Safe Exploration0false
Stribika's question on Safe Exploration0false
Marco Trevisan's question on Safe Exploration0true
Christopher G's question on Safe Exploration0false
Jaimie Knox's question on Safe Exploration0false
Ethan Greenhaw's question on Safe Exploration0false
Dustin King's question on Safe Exploration0false
Saxbend's question on Safe Exploration0false
Levi Poon's question on Safe Exploration0false
Alexey's question on Safe Exploration0false
Eric D'Aleo's question on Safe Exploration0false
Almost, but not entirely, Unreasonable's question on Safe Exploration0false
junoguten's question on Safe Exploration0false
P .Yannick's question on Safe Exploration0true
Yob's question on Safe Exploration0true
Rares Rotar's question on Safe Exploration0truePlex's Answer to Safe Exploration on 2020-12-06T21:48:38 by Rares Rotar
RUBBER BULLET's question on Safe Exploration0false
Tacheon Black's question on Safe Exploration0true
Harrison Fackrell's question on Safe Exploration0true
Obi Wan Macaroni's question on Safe Exploration0false
Rubén Guijarro's question on Safe Exploration0false
Nick Belanger's question on Safe Exploration0false
Melias Clarkson's question on Safe Exploration0false
Kaze Hikarinaka's question on Safe Exploration0false
Michael Price's question on Safe Exploration0true
FirstRisingSouI's question on Safe Exploration0false
GHOST's question on Safe Exploration0false
Alex Martin's question on Safe Exploration0false
Noah Moore's question on Safe Exploration0false
Jelly Joe's question on Safe Exploration0true
Anon's question on Safe Exploration0false
Roul Duke's question on Safe Exploration0false
Double Dragon's question on Safe Exploration0false
Beacon of Wierd's question on Safe Exploration0false
Peter Smythe's question on Safe Exploration0false
AverageYoutuber17's question on Safe Exploration0true
Maxim Gwiazda's question on Safe Exploration0false


To learn, you need to try new things, but that can be risky. How do we make AI systems that can explore safely?

Playlist of the series so far:
The paper, 'Concrete Problems in AI Safety':

AI Safety Gridworlds:
Why Would AI Want to do Bad Things? Instrumental Convergence:
Scalable Supervision: Concrete Problems in AI Safety Part 5:
The Evolved Radio and its Implications for Modelling the Evolution of Novel Sensors:

With thanks to my excellent Patreon supporters:

Jason Hise
Jason Strack
Stefan Skiles
Jordan Medina
Scott Worley
JJ Hepboin
Alex Flint
Pedro A Ortega
James McCuen
Richárd Nagyfi
Alec Johnson
Clemens Arbesser
Simon Strandgaard
Jonatan R
Michael Greve
The Guru Of Vision
Alexander Hartvig Nielsen
David Tjäder
Julius Brash
Tom O'Connor
Ville Ahlgren
Erik de Bruijn
Robin Green
Maksym Taran
Laura Olds
Jon Halliday
Bobby Cold
Paul Hobbs
Jeroen De Dauw
Tim Neilson
Eric Scammell
christopher dasenbrock
Igor Keller
Ben Glanton
Robert Sokolowski
Vlad D
Jérôme Frossard
Lupuleasa Ionuț
Sylvain Chevalier
Scott Stevens
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Marcel Ward
Andrew Weir
Taylor Smith
Ben Archer
Scott McCarthy
Phil Moyer
Tendayi Mawushe
Anne Kohlbrenner
Bjorn Nyblad
Jussi Männistö
Mr Fantastic
Matanya Loewenthal
Dave Tapley
Archy de Berker
Pablo Eder
Marc Pauly
Joshua Pratt
Gunnar Guðvarðarson
Shevis Johnson
Andy Kobre
Manuel Weichselbaum
Brian Gillespie
Martin Wind
Peggy Youell
Poker Chen
Darko Sperac
Paul Moffat
Jelle Langen
Lars Scholz
Anders Öhrt
Marco Tiraboschi
Michael Kuhinica
Fraser Cain
Robin Scharf
Oren Milman
John Rees
Shawn Hartsock
Seth Brothwell
Brian Goodrich
Michael S McReynolds

Media Sources:
"DashCam Russia - Crazy Drivers and Car Crashes 2018" (
Optimist Prime
"Hapless Boston Dynamics robot in shelf-stacking fail" (
"The Simpsons - Bart Gets Famous" (c) Fox 1994
"Donald Duck - Cured Duck" (c) Disney 1945
"Vase Breaking Slow Motion" (
"Fastest quadcopter i've ever flown + Most Destructive Crash" (
"An athlete uses physics to shatter world records - Asaf Bar-Yosef" (
"Uber self-driving car crash in Tempe, Arizona" (
"Quadcopter Fx Simulator" (
"Fallout - New Vegas by progamingwithed in 24:00 - AGDQ 2017 - Part 59" (
"Far Cry 5 out of 5 Physics Simulation" (


hi this is the latest video in a series
about the paper concrete problems in AI
safety you don't need to have seen the
previous videos for this but I'd
recommend checking them out anyway
there's a link in the description today
we're going to talk about safe
exploration so in an earlier video we
talked about the trade-off between
exploration and exploitation this is
kind of an inherent trade-off that all
agents face which just comes from the
fact that you're trying to do two jobs
at the same time one figure out what
things give you reward and to do the
things that give you a reward like
imagine you're in a restaurant you've
been to this restaurant before so you've
already tried some of the dishes now you
can either order something you've
already had that you know is quite good
ie you can exploit your current
knowledge or you can try ordering
something new off the menu that you've
never had before
you can explore to gain more knowledge
if you focus too much on exploring then
you're spending all of your time trying
random things when actually you may have
already found the thing that's best for
you but if you don't explore enough then
you might end up missing out on
something great finding the right
balance is an interesting problem now
the most naive form of reinforcement
learning is just to always do whichever
action you expect will give you the most
reward but agents that work this way end
up actually not doing very well because
as soon as they find something that
works a bit they just always do that
forever and never try anything else like
someone who just always orders the same
thing at the restaurant even though they
haven't tried most of the other things
on the menu in the grid world's video I
explained that one approach to
exploration in reinforcement learning is
to have an exploration rate so the
system will choose an action which it
thinks will give at the highest reward
something like 99% of the time but a
random 1% of the time it will just pick
an action completely at random this way
the system is generally doing whatever
will maximize its reward but it will
still try new things from time to time
this is a pretty basic approach and I
think you can see how that could cause
safety problems imagine a self-driving
car which 99% of the time does what it
thinks is the best choice of action and
1% of the time sets the steering wheel
or the accelerator or the brake to a
random value just to find out what would
happen that system might learn some
interesting things about vehicle
handling but at what cost
clearly this is an unsafe approach okay
so that's a very simple way of doing
exploration there are other ways of
doing it one approach is a sort of
artificial optimism
rather than implicitly giving unknown
actions zero expected reward or whatever
your best guess of the expected reward
of taking a random action would be you
artificially give them high expected
reward so that the system is sort of
irrationally optimistic about unknown
things whenever there's anything it
hasn't tried before it will assume that
it's good until it's tried it and found
out that it isn't so you end up with a
system that's like those people who say
oh I'll try anything once that's not
always a great approach in real life
there are a lot of things that you
shouldn't try even once and hopefully
you can see that that kind of approach
is unsafe for AI systems as well
you can't safely assume that anything
you haven't tried must be good now it's
worth noting that in more complex
problems these kinds of exploration
methods that involve occasionally doing
individual exploratory actions don't
perform very well in a complex problem
space you're pretty unlikely to find new
and interesting approaches just by
taking your current approach and
applying some random permutation to it
so one approach that people use is to
actually modify the goals of the system
temporarily to bring the system into new
areas of the space that it hasn't been
in before
imagine that you're learning to play
chess by playing against the computer
and you're kind of in a rut with your
strategy you're always playing
similar-looking games so you might want
to say to yourself okay this game rather
than my normal strategy I'll just try to
take as many of the opponent's pieces as
possible or this game I'll just try to
move my pieces as far across the board
as possible or I'll just try to capture
the Queen at all costs or something like
that you temporarily follow some new
policy which is not the one you'd
usually think is best and in doing that
you can end up visiting board states
that you've never seen before
and learning new things about the game
which in the long run can make you a
better player temporarily modifying your
goals allows you to explore the policy
space better than you could by just
sometimes playing a random move but you
can see how implementing this kind of
thing on a real-world AI system could be
much more dangerous than just having
your system sometimes choose random
actions if you're cleaning robot
occasionally makes totally random motor
movements in an attempt to do
exploration that's mostly just going to
make it less effective it might drop
things or fall over and that could be a
bit dangerous but what if it's sometimes
exhibited coherent goal-directed
behavior towards randomly chosen goals
what if as part of its exploration it
occasionally picks a new goal at random
and then puts together intelligent mult
step plans to pursue that goal that
could be much more dangerous than just
doing random things and the problem
doesn't come from the fact that the new
goals are random just that they're
different from the original goals
choosing non randomly might not be any
better you might imagine an AI system
where some part of the architecture is
sort of implicitly reasoning something
like part of my goal is to avoid
breaking this vars but we've never
actually seen the VARs being broken so
the system doesn't have a very good
understanding of how that happens so
maybe we should explore by temporarily
replacing the goal with one that values
breaking versus just so that the system
can break a bunch of arses and get a
sense for how that works temporarily
replacing the goal can make for good
learning and effective exploration but
it's not safe so the sorts of simple
exploration methods that were using with
current systems can be dangerous when
directly applied to the real world
now that vars example was kind of silly
a system that sophisticated after reason
about its state of knowledge like that
probably wouldn't need an architecture
that swaps out its goals to force it to
explore it could just pursue exploration
as an instrumental goal and in fact we'd
expect exploration to be a convergent
instrumental goal and if you don't know
what that means what's the video and
instrumental convergence but basically a
general intelligence should choose
exploratory actions just as a normal
part of pursuing its goals rather than
having exploration hard-coded into the
system's architecture such a system
should be able to find ways to learn
more about va's without actually
smashing any perhaps it could read a
book or watch a video and work things
out from that so I would expect unsafe
exploration to mostly be a problem with
relatively narrow systems operating in
the real world
our current AI systems and their
immediate descendants rather than
something we need to worry about a GIS
and super intelligence is doing given
that this is more of a near-term problem
it's actually relatively well explored
already people have spent some time
thinking about this so what are the
options for safe exploration well one
obvious thing to try is figuring out
what unsafe actions your system might
take while exploring and then
blacklisting those actions so let's say
you've got some kind of drug like an AI
controlled quadcopter that's flying
around and you want it to be able to
explore the different ways it could fly
but this is unsafe because the system
might explore maneuvers like flying
full-speed into the ground so what you
can do is have the system take
exploratory actions in whatever way you
usually do it but if the system enters a
region of space that's too close to the
another system detects that and
overrides the learning algorithm flying
the quadcopter higher and then handing
control back to the learning algorithm
again kind of like the second set of
controls they use when training humans
to safely operate vehicles now bear in
mind that here for simplicity I'm
talking about blacklisting unsafe
regions of the physical space that the
quadcopter is in but really this
approach is broader than that
you're really blacklisting unsafe
regions of the configuration space for
the agent in its environment it's not
just about navigating a physical space
your system isn't navigating an abstract
space of possibilities and you can have
a safety subsystem that takes over if
the system tries to enter an unsafe
region of that space this can work quite
well as long as you know all of the
unsafe things your system might do and
how to avoid them like ok now it's not
going to hit the ground but it could
still hit a tree so your system would
have to also keep track of where the
trees are and have a routine for safely
moving out of that area as well but the
more complex the problem is the harder
it is to list out and specify every
possible unsafe region of the space so
given that it might be extremely hard to
specify every region of unsafe behavior
you could try the opposite specify a
region of safe behavior you could say ok
the safe zone is anywhere above this
altitude the height of the tallest
obstacles you might hit and below this
altitude like the altitude of the lowest
aircraft you might hit and within this
boundary which is like the border of
some empty field somewhere anywhere in
this space is considered to be safe so
the system explores as usual in this
area and if it ever moves outside the
area the safety subsystem overrides it
and takes it back into the safe area
specifying a whitelisted area can be
safer than specifying blacklisted areas
because you don't need to think of every
possible bad thing that can happen you
just need to find a safe region the
problem is your ability to check the
space and ensure that it's safe is
limited again this needn't be a physical
space it's a configuration space and as
the system becomes more and more
complicated the configuration space
becomes much larger so the area that
you're able to really know is safe
becomes a smaller and smaller proportion
of the actual available configuration
space this means you might be severely
limiting what your system can do since
it can only explore a small corner of
the options if you try to make your safe
region larger than the area that you're
able to properly check you risk
including some dangerous configurations
so your system can then behave
safely but if you limit the safe region
to the size that you're able to actually
confirm is safe your systems will be
much less capable since there are
probably all kinds of good strategies
that it's never going to be able to find
because they happen to lie outside of
the space despite being perfectly safe
the extreme case of this is where you
have an expert demonstration and then
you have the system just try to copy
what the expert did as closely as
possible or perhaps you allow some small
region of deviation from the expert
demonstration but that system is never
going to do much better than the human
expert because it can't try anything too
different from what humans do in this
case you've removed almost all of the
problems of safe exploration by removing
almost all of the exploration so you can
see this is another place where we have
a trade-off between safety and
capability all right what other
approaches are available well human
oversight is one that's often used
self-driving cars have a human in them
who can override the system in principle
you can do the same with exploration
have the system check with a human
before doing each exploratory action but
as we talked about in these scalable
supervision videos this doesn't scale
very well the system might need to make
millions of exploratory actions and it's
not practical to have a human check all
of those or it might be a high speed
system that needs inhumanly fast
oversight if you need to make decisions
about exploration in a split second a
human will be too slow to provide that
supervision so there's a synergy there
if we can improve the scalability of
human supervision that could help with
safe exploration as well and the last
approach I'm going to talk about is
simulation this is a very popular
approach and it works quite well if you
do your exploration in a simulation then
even if it goes horribly wrong it's not
a problem you can crash your simulated
quadcopter right into your own simulated
face and it's no big deal the problems
with simulation probably deserve a whole
video to themselves but basically
there's always a simulation gap it's
extremely difficult to get simulations
that accurately represent the problem
domain and the more complex the problem
is the harder this becomes so learning
in a simulation can limit the
capabilities of your AI system for
example when researchers were trying to
see if an evolutionary algorithm could
invent an electronic oscillator a
circuit that would generate a signal
that repeats at a particular frequency
their system developed a very weird
looking thing that clearly was not an
oscillator circuit but which somehow
mysteriously produced a good oscillating
output anyway now you would think it was
a bug in the simulation but they weren't
using Simula
the circuits physically existed this
circuit produced exactly the output
they'd asked for but they had no idea
how it did it eventually they figured
out that it was actually a radio it was
picking up the very faint radio signals
put out by the electronics of a nearby
computer and using that to generate the
correct signal the point is this is a
cool unexpected solution to the problem
which would almost certainly not have
been found in a simulation I mean would
you think to include ambient radio noise
in your oscillator circuit simulation by
doing its learning in a simulator a
system is only able to use the aspects
of the world that we think are important
enough to include in the simulation
which limits its ability to come up with
things that we wouldn't have thought of
and that's a big part of why we want
such systems in the first place
and this goes the other way as well of
course it's not just that things in
reality may be missing from your
simulation but your simulation will
probably have some things that reality
doesn't ie bugs the thing that makes
this worse is that if you have a smart
AI system it's likely to end up actually
seeking out the inaccuracies in your
simulation because the best solutions
are likely to involve exploiting those
bugs like if your physics simulation has
any bugs in it there's a good chance
those bugs can be exploited to violate
conservation of momentum or to get free
energy or whatever so it's not just that
the simulation may not be accurate to
reality it's that most of the best
solutions will lie in the parts of the
configuration space where the simulation
is the least accurate to reality the
general tendency for optimization to
find the edges of systems to find their
means that it's hard to be confident
that actions which seem safe in a
simulation will actually be safe in
reality at the end of the day
exploration is inherently risky because
almost by definition it involves trying
things without knowing exactly how it'll
turn out but there are ways of managing
and minimizing that risk and we need to
find them so that our AI systems can
explore safely
I want to end this video by saying thank
you so much to my amazing patrons it's
all all of these people here and in this
video I especially want to thank Scott
Worley thank you all so much for
sticking with me through this giant gap
in uploads when I do upload videos to
this channel or the second channel
patrons get to see them a few days
before everyone else and I'm also
posting the videos I make for the online
AI safety course that I'm helping to
develop an occasional behind-the-scenes
videos - like right now I'm putting
together a video about my visit to the
electromagnetic field festival this year
where I gave a talk and actually met
some of you in person which was fun
anyway thank you again for your support
and thank you all for watching I'll see
you soon