Projection-Based Fast and Safe Policy Optimization for Reinforcement Learning

Shijun Lin,Hao Wang,Ziyang Chen,Zhen Kan,Shijun Lin,Hao Wang,Ziyang Chen,Zhen Kan

While reinforcement learning (RL) attracts increasing research attention, maximizing the return while keeping the agent safe at the same time remains an open problem. Motivated to address this challenge, this work proposes a new Fast and Safe Policy Optimization (FSPO) algorithm, which consists of three steps: the first step involves reward improvement update, the second step projects the policy t...