Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

Yuxin Pan,Fangzhen Lin,Yuxin Pan,Fangzhen Lin

Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model to reduce interactions with the real environment. The recent model-based RL method considers the way to learn a backward model that specifies the conditional probability of the previous state given the previous action and the current state to additionally generate backward ro...