Active Reward Learning from Online Preferences

robot,ICRA 2023

Vivek Myers,Erdem Bıyık,Dorsa Sadigh,Vivek Myers,Erdem Bıyık,Dorsa Sadigh

Robot policies need to adapt to human preferences and/or new environments. Human experts may have the domain knowledge required to help robots achieve this adaptation. However, existing works often require costly offline re-training on human feedback, and those feedback usually need to be frequent and too complex for the humans to reliably provide. To avoid placing undue burden on human experts an...

Active Reward Learning from Online Preferences

Vivek Myers,Erdem Bıyık,Dorsa Sadigh,Vivek Myers,Erdem Bıyık,Dorsa Sadigh

Discussion

Related Contents