Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

Se Hwan Jeon,Steve Heim,Charles Khazoom,Sangbae Kim,Se Hwan Jeon,Steve Heim,Charles Khazoom,Sangbae Kim

The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential b...