Better Gradient Steps for Deep On-Policy Reinforcement Learning

Aug 1, 2024·

Ryan Pégoud

Thibault Lahire

· 0 min read

Abstract

In deep on-policy reinforcement learning, algorithms collect transitions by executing the current policy in the environment. These transitions are then used during a gradient ascent step, aiming at updating the neural network(s) parameter. This article studies how the collected transitions can be prioritized to speed up the gradient ascent process toward a favorable policy. To do so, we weigh the transitions in the update gradient ascent equation with their per-sample gradient norms, which is a measure of the margin of change which can occur in the neural network.

Type

Workshop Paper

Publication

In Aligning Reinforcement Learning Experimentalists and Theorists Workshop at International Conference on Machine Learning

Last updated on Aug 1, 2024

Reinforcement Learning Sample Efficiency for RL

Authors

Ryan Pégoud

PhD Candidate (BMW ProMotion)

← Syllabus: Portable Curricula for Reinforcement Learning Agents (Outstanding Paper Award) Aug 4, 2025