开发工具:
文件大小: 2mb
下载次数: 0
上传时间: 2019-08-24
详细说明:本课件讲解了强化学习的基本问题,经典Q学习理论,深度Q学习理论和程序讲解与训练。强化学习相关参考资料
网络资源
01
https://www.intelnervana.com/demystifying-deep-reinforcement-learning/
http://artint.info/html/artint265.html
参考文献
02
Playing Atari with Deep Reinforcement Learning 2013: arXiv: 1312.5602v1
Continuous control with deep reinforcement learning 2016: arXiv: 1509.02971v5
Human-level control through deep reinforcement learning 2015: Nature 14236
Mastering the game of Go without human knowledge 2017: Nature 24270
视频及网上课程
03
http://videolectures.net/rldm2015silverreinforcementlearning
http://wwwo.cs.ucl.ac.uk/staff/d.silver/web/tEachinG.html
http://rll.berkeley.edu/deeprlcourse/
目录 contents01强化学习的基本问题
02经典Q学习理论
03深度Q学习理论
04程序讲解与训练
PART 01
强化学习的基本问题
Problems of Learning
强化学习的基本间题
https://www.technologyreview.com/s/603029/a-3-d-world-for-smarter-ai-agents/
.DeepMind
Auxiliary Tasks
State(s)
● Live Play
01
Reward
当前状态(图像、声音、向量等)
2+x
Pixel Control
Reward (r)
02
奖赏与激励
Action (a
Reward Prediction Value Function Replay
03
动作(向前,向后,向左,向右)
100122
Value function(Q)
Reward
04
评价函数Q(sa)
No Reward
Actions
Value Function
强化学习的基本间题
Supervised
Reinforcement
UnSupervised
Learning
Learning
Learnin
g
Dense
Sparse
NO
Label
Label
Label
Time-delayed
Label( state i, Action k)=i good, bad]
监督还是非监督?是个问题
强化学习的基本间题
thm
algo
o The agent has to exploit what it already
E-greedy
knows in order to obtain reward but it also
冒险-保守两难
has to explore in order to make better action
Explore-Exploit Dilema
selections in the future
0
o Dilemma neither exploitation nor
exploration can be pursued exclusively
功劳归属问题
without failing at the task
Credit Assignment Problem
02
A需要充分利用( explo已有知识去获得奖劢,但它也必须
探索( explore)一些更佳的动作选项以便获得更好的奖赏
两难:只坐享其成( exploi)或只冒险
探索( explore)都可能导致任务失败
两大问题
强化学习的基本间题
thm
algo
Action/order
1st 2nd 3rd 4th 5th
E-greedy
a What chain Take a car
of actions
冒险-保守两难
Take a bike
resulted in
Explore-Exploit Dilema
Open door
0
reward
Sit down
■ Which of the
Shift in seat
action to the
Point at menu
功劳归属问题
right got you liter
Credit Assignment Problem
your steak?
Tapped the table
02
Which of the preceding actions was responsible for
getting the reward and to what extent?
Value Function
Q learning
哪个动作导致获得奖励,以何种程度?
十五的月亮,照在家乡照在边关
有你的一半,也有我的一半。”
去
两大问题
PART 02
经典Q学习理论
Classical Q-Learning
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.