文件名称:
Deep.Reinforcement.Learning.Han.-.Maxim.Lapan.pdf
开发工具:
文件大小: 12mb
下载次数: 0
上传时间: 2019-08-18
详细说明:Deep Reinforcement Learning Hands-On
by Maxim LapanTable of contents
Deep reinforcement Learning Hands-On
Why subscribe?
Packtpub.com
Contributors
about the author
about the reviewers
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code file
Download the color images
Conventions used
Get in touch
Reviews
1. What is Reinforcement Learning?
earning-supervised, unsupervised, and reinforcement
RL formalisms and relations
Reward
The agent
The enyironment
Actions
Observations
Markov decision processes
Markov process
Markov reward process
Markov decision process
Summary
2. OpenAl Gym
he anatomy of the agent
Hardware and software requirements
Openal Gym APt
Action space
Observation space
The environment
Creation of the environment
The CartPole session
The random Cart Pole agent
The extra Gym functionality-wrappers and monitors
Wrappers
Monitor
Summary
3. Deep Learning with PyTorch
Tensors
reation of tensors
Scalar tensors
Tensor operations
GPU tensors
Gradients
Tensors and gradients
nN building blocks
Custom layers
Final glue- loss functions and optimizers
Loss functions
Optimizers
Monitoring with Tensor Board
Tensor Board 10
Plotting stuff
Example- gan on Atari images
Summar
4. The Cross-Entropy Method
Taxonomy of RL methods
Practical cross-entropy
Cross-entropy on Cartpole
Cross-entropy on FrozenLake
Theoretical background of the cross-entropy method
Summary
5. Tabular Learning and the Bellman Equation
Value, state, and optimality
The Bellman equation of optimality
Value of action
The value iteration method
Value iteration in practice
Q-learning for FrozenLake
Summary
6. Deep o-Networks
Real-life value iteration
Tabular O-learning
Deep Q-learning
Interaction with the environment
SGD optimization
Correlation between steps
The Markov property
The final form of don training
dON on pong
Wrappers
DON model
Training
Running and performance
Your model in action
Summary
7. DON Extensions
The PyTorch Agent Net library
Age
ent
Agent's experience
Experience buffer
Gym env wrappers
Basic don
N-step dON
Implementation
Double don
Implementation
Results
Noisy networks
Implementation
Results
Prioritized replay buffer
Implementation
Results
Dueling DoN
Implementation
Results
Categorical DON
Implementation
Results
Combining everything
Implementation
Results
ummar y
References
8. Stocks Trading Using RL
Trading
Data
Problem statements and key decisions
The trading environment
Model
raining code
Results
The feed-forward model
The convolution model
Things to try
Summary
9. Policy gradients- An Alternative
Values and policy
Why policy?
Policy representation
Policy gradients
The reinforce method
The CartPole example
Results
Policy-based versus value-based methods
REINFORCE issues
Full episodes are required
High gradients variance
Exploration
Correlation between samples
PG on cartPole
Results
PG on pong
Results
Summar
10. The Actor-Critic Method
Variance reduction
CartPole variance
Actor-critic
A2C on Pong
△2 C on pong results
Tuning hyperparameters
Learning rate
Entropy beta
Count of environments
Batch size
ummary
11. Asynchronous Advantage Actor-Critic
Correlation and sample efficiency
Adding an extra a to A2C
Multiprocessing in Python
A3C-data parallelism
Results
A3C-gradients parallelism
Results
Summary
12. Chatbots Training with rl
Chatbots overview
Deep np basics
Recurrent Neural Networks
Embeddings
Encoder -Decoder
raining of seqzseg
Log-likelihood training
Bilingual evaluation understudy bleu score
RLin seq2seq
Self-critical sequence training
The chatbot example
The example structure
Modules: cornell. py and data. py
BLEU Score and utils.py
Model
Training: cross-entropy
Running the training
Checking the data
Testing the trained model
Training SCst
Running the scst training
Results
Telegram bot
Summary
13. Web navigation
Web navigation
Browser automation and rl
Mini world of bits benchmark
OpenAl Universe
Installation
Actions and observations
Environment creation
Mini wob stability
Simple clicking approach
Grid actions
Example overview
Model
Training code
Starting containers
Training process
Checking the learned policy
Issues with simple clicking
Human demonstrations
Recording the demonstrations
Recording format
Training using demonstrations
Results
TicTacToe problem
Adding text description
Results
Things to try
S
ummary
14. Continuous Action Space
Why a continuous space?
Action space
Environments
The Actor-Critic(A2C) method
Implementation
Results
USing models and recording videos
Deterministic policy gradients
Exploration
Implementation
Results
Recording videos
Distributional policy gradients
Architecture
Implementation
Results
hings to try
Summary
15. Trust Regions- TRPO, PPO, and ACKTR
Introduction
R
h
oboschoo
a2C baseline
Results
Videos recording
gg
Proximal Policy optimization
Implementation
Results
Trust Region Policy optimization
Implementation
Results
A2C using ACKTR
Implementation
Results
Summary
16. Black-Box Optimization in RL
Black-box methods
Evolution strategies
ES on cartPole
Results
ES on halfcheetah
Results
Genetic algorithms
GA on cartpole
Results
GA tweaks
Deep ga
Novelty search
Ga on cheetah
Results
ummary
References
17. Beyond Model-Free-Imagination
Model-based versus model-free
Model imperfections
Imagination-augmented agent
The enyironment model
The rollout policy
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.