Take AISafety.info’s 3 minute survey to help inform our strategy and priorities

Take the survey

What is reward shaping?

Reward shaping, in general, can be defined as a method that grants rewards when solving tasks correctly to make learning easier. This reinforces appropriate behaviors when tackling problems since there are positives and negatives for each of them. For example, in a game, the aim is to win, not lose.

In machine learning, reward shaping is typically associated with reinforcement learning

. Reinforcement learning has no correct or incorrect answer to problems, so it applies this reward-based system with frequent feedback to learn appropriate behaviors based on the rewards passing through constant trial and error. For instance, in a game, playing well can lead to more points, which is rewarded, and doing badly can result in low points, which is punished. Like in a car racing game, the agents can possibly be rewarded relative to the time they occupy a high rank and for beating other vehicles.

The reward model does have some weaknesses, like requiring lots of interactions, it may be expensive, and sparse rewards can slow down learning, but the underlying goal of an agent/AI is to maximize the rewards to attempt to reach an optimal solution to problems using this reward-shaping model.



AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.

© AISafety.info, 2022—2025

Aisafety.info is an Ashgro Inc Project. Ashgro Inc (EIN: 88-4232889) is a 501(c)(3) Public Charity incorporated in Delaware.