Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision

Ashvin Nair,Brian Zhu,Gokul Narayanan,Eugen Solowjow,Sergey Levine,Ashvin Nair,Brian Zhu,Gokul Narayanan,Eugen Solowjow,Sergey Levine

Learning-based methods in robotics hold the promise of generalization, but what can be done if a learned policy does not generalize to a new situation? In principle, if an agent can at least evaluate its own success (i.e., with a reward classifier that generalizes well even when the policy does not), it could actively practice the task and finetune the policy in this situation. We study this probl...