您好,欢迎光临本网站![请登录][注册会员]  
文件名称: 强化学习课件.pdf
  所属分类: 深度学习
  开发工具:
  文件大小: 2mb
  下载次数: 0
  上传时间: 2019-08-24
  提 供 者: home****
 详细说明:本课件讲解了强化学习的基本问题,经典Q学习理论,深度Q学习理论和程序讲解与训练。强化学习相关参考资料 网络资源 01 https://www.intelnervana.com/demystifying-deep-reinforcement-learning/ http://artint.info/html/artint265.html 参考文献 02 Playing Atari with Deep Reinforcement Learning 2013: arXiv: 1312.5602v1 Continuous control with deep reinforcement learning 2016: arXiv: 1509.02971v5 Human-level control through deep reinforcement learning 2015: Nature 14236 Mastering the game of Go without human knowledge 2017: Nature 24270 视频及网上课程 03 http://videolectures.net/rldm2015silverreinforcementlearning http://wwwo.cs.ucl.ac.uk/staff/d.silver/web/tEachinG.html http://rll.berkeley.edu/deeprlcourse/ 目录 contents01强化学习的基本问题 02经典Q学习理论 03深度Q学习理论 04程序讲解与训练 PART 01 强化学习的基本问题 Problems of Learning 强化学习的基本间题 https://www.technologyreview.com/s/603029/a-3-d-world-for-smarter-ai-agents/ .DeepMind Auxiliary Tasks State(s) ● Live Play 01 Reward 当前状态(图像、声音、向量等) 2+x Pixel Control Reward (r) 02 奖赏与激励 Action (a Reward Prediction Value Function Replay 03 动作(向前,向后,向左,向右) 100122 Value function(Q) Reward 04 评价函数Q(sa) No Reward Actions Value Function 强化学习的基本间题 Supervised Reinforcement UnSupervised Learning Learning Learnin g Dense Sparse NO Label Label Label Time-delayed Label( state i, Action k)=i good, bad] 监督还是非监督?是个问题 强化学习的基本间题 thm algo o The agent has to exploit what it already E-greedy knows in order to obtain reward but it also 冒险-保守两难 has to explore in order to make better action Explore-Exploit Dilema selections in the future 0 o Dilemma neither exploitation nor exploration can be pursued exclusively 功劳归属问题 without failing at the task Credit Assignment Problem 02 A需要充分利用( explo已有知识去获得奖劢,但它也必须 探索( explore)一些更佳的动作选项以便获得更好的奖赏 两难:只坐享其成( exploi)或只冒险 探索( explore)都可能导致任务失败 两大问题 强化学习的基本间题 thm algo Action/order 1st 2nd 3rd 4th 5th E-greedy a What chain Take a car of actions 冒险-保守两难 Take a bike resulted in Explore-Exploit Dilema Open door 0 reward Sit down ■ Which of the Shift in seat action to the Point at menu 功劳归属问题 right got you liter Credit Assignment Problem your steak? Tapped the table 02 Which of the preceding actions was responsible for getting the reward and to what extent? Value Function Q learning 哪个动作导致获得奖励,以何种程度? 十五的月亮,照在家乡照在边关 有你的一半,也有我的一半。” 去 两大问题 PART 02 经典Q学习理论 Classical Q-Learning
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 相关搜索: 强化学习pdf
 输入关键字,在本站1000多万海量源码库中尽情搜索: