Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policy Optimization

Souradip Chakraborty,Amrit Singh Bedi,Kasun Weerakoon,Prithvi Poddar,Alec Koppel,Pratap Tokekar,Dinesh Manocha,Souradip Chakraborty,Amrit Singh Bedi,Kasun Weerakoon,Prithvi Poddar,Alec Koppel,Pratap Tokekar,Dinesh Manocha

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse rewards are common in continuous control robotics tasks such as manipulation and navigation and make the learning problem hard due to the non-trivial estimation of value functions over the state space. This demands either ...