Aligning Human Preferences with Baseline Objectives in Reinforcement Learning

Daniel Marta,Simon Holk,Christian Pek,Jana Tumova,Iolanda Leite,Daniel Marta,Simon Holk,Christian Pek,Jana Tumova,Iolanda Leite

Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an amplitude of factors, such as designing reward functions that cover every possible interaction. To address the heavy burden of robot reward engineering, we aim to leverage subjective human preferences gathered in the context of human-robot interaction, while taking advantage of a baseline reward func...