Proximal Deterministic Policy Gradient
Marco Maggipinto,Gian Antonio Susto,Pratik Chaudhari,Marco Maggipinto,Gian Antonio Susto,Pratik Chaudhari
This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algor...


