Better Gradient Steps for Deep On-Policy Reinforcement Learning

Aug 1, 2024·
Ryan Pégoud
Ryan Pégoud
,
Thibault Lahire
· 0 min read
PDF
Abstract
In deep on-policy reinforcement learning, algorithms collect transitions by executing the current policy in the environment. These transitions are then used during a gradient ascent step, aiming at updating the neural network(s) parameter. This article studies how the collected transitions can be prioritized to speed up the gradient ascent process toward a favorable policy. To do so, we weigh the transitions in the update gradient ascent equation with their per-sample gradient norms, which is a measure of the margin of change which can occur in the neural network.
Type
Publication
In Aligning Reinforcement Learning Experimentalists and Theorists Workshop at International Conference on Machine Learning