Hands-On Intelligent Agents with OpenAI Gym

上QQ阅读APP看书，第一时间看更新

Practical applications of reinforcement and deep reinforcement learning algorithms

Until recently, practical applications of reinforcement learning and deep reinforcement learning were limited, due to sample complexity and instability. But, these algorithms proved to be quite powerful in solving some really hard practical problems. Some of them are listed here to give you an idea:

Learning to play video games better than humans: This news has probably reached you by now. Researchers at DeepMind and others developed a series of algorithms, starting with DeepMind's Deep-Q-Network, or DQN for short, which reached human-level performance in playing Atari games. We will actually be implementing this algorithm in a later chapter of this book! In essence, it is a deep variant of the Q-learning algorithm we briefly saw in this chapter, with a few changes that increased the speed of learning and the stability. It was able to reach human-level performance in terms of game scores after several games. What is more impressive is that the same algorithm achieved this level of play without any game-specific fine-tuning or changes!

Mastering the game of Go: Go is a Chinese game that has challenged AI for several decades. It is played on a full-size 19 x 19 board and is orders of magnitude more complex than chess because of the large number () of possible board positions. Until recently, no AI algorithm or software was able to play anywhere close to the level of humans at this game. AlphaGo—the AI agent from DeepMind that uses deep reinforcement learning and Monte Carlo tree search—changed this all and beat the human world champions Lee Sedol (4-1) and Fan Hui (5-0). DeepMind released more advanced versions of their AI agent, named AlphaGO Zero (which uses zero human knowledge and learned to play all by itself!) and AlphaZero (which could play the games of Go, chess, and Shogi!), all of which used deep reinforcement learning as the core algorithm.
Helping AI win Jeopardy!: IBM's Watson—an AI system developed by IBM, which came to fame by beating humans at Jeopardy!—used an extension of TD learning to create its daily-double wagering strategies that helped it to win against human champions.
Robot locomotion and manipulation: Both reinforcement learning and deep reinforcement learning have enabled the control of complex robots, both for locomotion and navigation. Several recent works from the researchers at UC Berkeley have shown how, using deep reinforcement, they train policies that offer vision and control for robotic manipulation tasks and generate join actuations for making a complex bipedal humanoid walk and run.