POLITE: Preferences Combined with Highlights in Reinforcement Learning
Simon Holk,Daniel Marta,Iolanda Leite,Simon Holk,Daniel Marta,Iolanda Leite
Many solutions to address the challenge of robot learning have been devised, namely through exploring novel ways for humans to communicate complex goals and tasks in reinforcement learning (RL) setups. One way that experienced recent research interest directly addresses the problem by considering human feedback as preferences between pairs of trajectories (sequences of state-action pairs). However...