Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration
Jinning Li,Xinyi Liu,Banghua Zhu,Jiantao Jiao,Masayoshi Tomizuka,Chen Tang,Wei Zhan,Jinning Li,Xinyi Liu,Banghua Zhu,Jiantao Jiao,Masayoshi Tomizuka,Chen Tang,Wei Zhan
Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e.g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy f...