Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning

Yang Liu,Marius Hofert,Yang Liu,Marius Hofert

Offline reinforcement learning (RL) aims to optimize a policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data and then guide the updating of the policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL...