Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
Yuchen Wu,Melissa Mozifian,Florian Shkurti,Yuchen Wu,Melissa Mozifian,Florian Shkurti
The potential benefits of model-free reinforcement learning to real robotics systems are limited by its uninformed exploration that leads to slow convergence, lack of data-efficiency, and unnecessary interactions with the environment. To address these drawbacks we propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent po...