SafeTAC: Safe Tsallis Actor-Critic Reinforcement Learning for Safer Exploration
Dohyeong Kim,Jaeseok Heo,Songhwai Oh,Dohyeong Kim,Jaeseok Heo,Songhwai Oh
Satisfying safety constraints is the top priority in safe reinforcement learning (RL). However, without proper exploration, an overly conservative policy such as freezing at the same position can be generated. To this end, we utilize maximum entropy RL methods for exploration. In particular, an RL method with Tsallis entropy maximization, called Tsallis actor-critic (TAC), is used to synthesize po...