VARIQuery: VAE Segment-Based Active Learning for Query Selection in Preference-Based Reinforcement Learning

Daniel Marta,Simon Holk,Christian Pek,Jana Tumova,Iolanda Leite,Daniel Marta,Simon Holk,Christian Pek,Jana Tumova,Iolanda Leite

Human-in-the-loop reinforcement learning (RL) methods actively integrate human knowledge to create reward functions for various robotic tasks. Learning from preferences shows promise as alleviates the requirement of demonstrations by querying humans on state-action sequences. However, the limited granularity of sequence-based approaches complicates temporal credit assignment. The amount of human q...