Proximal Deterministic Policy Gradient

robot,IROS 2020

Marco Maggipinto,Gian Antonio Susto,Pratik Chaudhari,Marco Maggipinto,Gian Antonio Susto,Pratik Chaudhari

This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algor...

Proximal Deterministic Policy Gradient

Marco Maggipinto,Gian Antonio Susto,Pratik Chaudhari,Marco Maggipinto,Gian Antonio Susto,Pratik Chaudhari

Discussion

Related Contents