Active Reward Learning from Online Preferences

Vivek Myers,Erdem Bıyık,Dorsa Sadigh,Vivek Myers,Erdem Bıyık,Dorsa Sadigh

Robot policies need to adapt to human preferences and/or new environments. Human experts may have the domain knowledge required to help robots achieve this adaptation. However, existing works often require costly offline re-training on human feedback, and those feedback usually need to be frequent and too complex for the humans to reliably provide. To avoid placing undue burden on human experts an...