Learning Video-Conditioned Policies for Unseen Manipulation Tasks

Elliot Chane-Sane,Cordelia Schmid,Ivan Laptev,Elliot Chane-Sane,Cordelia Schmid,Ivan Laptev

The ability to specify robot commands by a non-expert user is critical for building generalist agents capable of solving a large variety of tasks. One convenient way to specify the intended robot goal is by a video of a person demonstrating the target task. While prior work typically aims to imitate human demonstrations performed in robot environments, here we focus on a more realistic and challen...