Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Haotian Xu,Shengjie Wang,Zhaolei Wang,Yunzhe Zhang,Qing Zhuo,Yang Gao,Tao Zhang,Haotian Xu,Shengjie Wang,Zhaolei Wang,Yunzhe Zhang,Qing Zhuo,Yang Gao,Tao Zhang

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose an algorithm named Constrained Policy O...