上QQ阅读APP看书,第一时间看更新
Rewards
A reward, denoted by , is usually a scalar quantity that is provided as feedback to the agent to drive its learning. The goal of the agent is to maximize the sum of the reward, and this signal indicates how well the agent is doing at time step . The following examples of reward signals for different tasks may help you get a more intuitive understanding:
- For the Atari games we discussed before, or any computer games in general, the reward signal can be +1 for every increase in score and -1 for every decrease in score.
- For stock trading, the reward signal can be +1 for each dollar gained and -1 for each dollar lost by the agent.
- For driving a car in simulation, the reward signal can be +1 for every mile driven and -100 for every collision.
- Sometimes, the reward signal can be sparse. For example, for a game of chess or Go, the reward signal could be +1 if the agent wins the game and -1 if the agent loses the game. The reward is sparse because the agent receives the reward signal only after it completes one full game, not knowing how good each move it made was.