What is behavioral cloning?
Behavioral cloning is a form of imitation learning. It involves gathering observations of the behavior of an “expert demonstrator” who is good at the task being trained for, and then using supervised learning to train an AI agent
A system that can be understood as taking actions towards achieving a goal.
Behavioral cloning differs from other forms of imitation learning (such as inverse reinforcement learning
A machine learning method in which the machine gets rewards based on its actions, and is adjusted to be more likely to take actions that lead to high reward.
Behavioral cloning was originally developed to train self-driving cars, and this use case serves as a good example of how behavioral cloning works:
- First, while a human "demonstrator" drives a car, we collect data about 1) states of the environment (using sensors such as cameras and Lidars) and 2) the actions that the demonstrator takes in each environmental state (such as steering, accelerating/braking, and gear shifting).
- Next, we create a dataset consisting of (state, action) pairs.
- Finally, we use supervised learning to train a model that takes the environmental state as an input and predicts the driver’s action.
When the accuracy of this model is high enough, we can say that the driver’s behavior has been “cloned”.
Behavioral cloning is also sometimes used to fine-tune
Fine-tuning is the process of adapting a pre-trained ML model for more specific tasks or to display more specific behaviors.
An AI model that takes in some text and predicts how the text is most likely to continue.
Sources
- Stanford CS234: Reinforcement Learning (2019) , Lecture 7 - Imitation Learning
- Berkeley CS182: Reinforcement Learning (2021), Lecture 14 - Imitation Learning
- leogao (2021). Behavior Cloning is Miscalibrated
- Ortega, Pedro, et al. (2021). "Shaking the foundations: delusions in sequence models for interaction and control.”
- Zhou, Chunting, et al. (2020). "Detecting Hallucinated Content in Conditional Neural Sequence Generation"
- Xiao, Yijun, and Wang, William. (2021) "On Hallucination and Predictive Uncertainty in Conditional Language Generation."