关于Hindsight Experience Replay的原始论文,适合初学者对深度强化学习Hindsight Experience Replay的认识和了解is to periodically set the weights of the target network to the current weights of the main network(e. g
Mnih et al. (2015)) or to use a polyak-averaged(Polyak and Judits