Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Farzan Memarian,Wonjoon Goo,Rudolf Lioutikov,Scott Niekum,Ufuk Topcu,Farzan Memarian,Wonjoon Goo,Rudolf Lioutikov,Scott Niekum,Ufuk Topcu

We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards. The proposed framework alternates between classification-based reward inference and policy update steps—the original sparse reward provides a self-supervisory signal for reward inference by ranking trajectories...