说明: keras实现REINFORCE算法强化学习: # Policy Gradient Minimal implementation of Stochastic Policy Gradient Algorithm in Keras ## Pong Agent  This PG agent seems to get more frequent wins after about 8000 episodes. Below is the score graph. <kkatnv> 在 上传 | 大小:6291456