A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

Sheelabhadra Dey,Sumedh Pendurkar,Guni Sharon,Josiah P. Hanna,Sheelabhadra Dey,Sumedh Pendurkar,Guni Sharon,Josiah P. Hanna

In various control task domains, existing controllers provide a baseline level of performance that—though possibly suboptimal—should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. ...