GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Junghyun Kim,Gi-Cheon Kang,Jaein Kim,Suyeon Shin,Byoung-Tak Zhang,Junghyun Kim,Gi-Cheon Kang,Jaein Kim,Suyeon Shin,Byoung-Tak Zhang

Language-Guided Robotic Manipulation (LGRM) is a challenging task as it requires a robot to understand human instructions to manipulate everyday objects. Recent approaches in LGRM rely on pre-trained Visual Grounding (VG) models to detect objects without adapting to manipulation environments. This results in a performance drop due to a substantial domain gap between the pre-training and real-world...