Deep Reinforcement Learning Hands-On

Maxim Lapan

更新时间：2021-06-25 20:47:21

最新章节：Index

封面

版权信息

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is Searching for Authors Like You

Preface

What this book covers

To get the most out of this book

Get in touch

Chapter 1. What is Reinforcement Learning?

Learning – supervised unsupervised and reinforcement

RL formalisms and relations

Markov decision processes

Summary

Chapter 2. OpenAI Gym

The anatomy of the agent

Hardware and software requirements

OpenAI Gym API

The random CartPole agent

The extra Gym functionality – wrappers and monitors

Summary

Chapter 3. Deep Learning with PyTorch

Tensors

Gradients

NN building blocks

Custom layers

Final glue – loss functions and optimizers

Monitoring with TensorBoard

Example – GAN on Atari images

Summary

Chapter 4. The Cross-Entropy Method

Taxonomy of RL methods

Practical cross-entropy

Cross-entropy on CartPole

Cross-entropy on FrozenLake

Theoretical background of the cross-entropy method

Summary

Chapter 5. Tabular Learning and the Bellman Equation

Value state and optimality

The Bellman equation of optimality

Value of action

The value iteration method

Value iteration in practice

Q-learning for FrozenLake

Summary

Chapter 6. Deep Q-Networks

Real-life value iteration

Tabular Q-learning

Deep Q-learning

DQN on Pong

Summary

Chapter 7. DQN Extensions

The PyTorch Agent Net library

Basic DQN

N-step DQN

Double DQN

Noisy networks

Prioritized replay buffer

Dueling DQN

Categorical DQN

Combining everything

Summary

References

Chapter 8. Stocks Trading Using RL

Trading

Data

Problem statements and key decisions

The trading environment

Models

Training code

Results

Things to try

Summary

Chapter 9. Policy Gradients – An Alternative

Values and policy

The REINFORCE method

REINFORCE issues

PG on CartPole

PG on Pong

Summary

Chapter 10. The Actor-Critic Method

Variance reduction

CartPole variance

Actor-critic

A2C on Pong

A2C on Pong results

Tuning hyperparameters

Summary

Chapter 11. Asynchronous Advantage Actor-Critic

Correlation and sample efficiency

Adding an extra A to A2C

Multiprocessing in Python

A3C – data parallelism

A3C – gradients parallelism

Summary

Chapter 12. Chatbots Training with RL

Chatbots overview