Grounding Conversational Robots on Vision Through Dense Captioning and Large Language Models

robot,ICRA 2024

Lucrezia Grassi,Zhouyang Hong,Carmine Tommaso Recchiuto,Antonio Sgorbissa,Lucrezia Grassi,Zhouyang Hong,Carmine Tommaso Recchiuto,Antonio Sgorbissa

This work explores a novel approach to empowering robots with visual perception capabilities using textual descriptions. Our approach involves the integration of GPT-4 with dense captioning, enabling robots to perceive and interpret the visual world through detailed text-based descriptions. To assess both user experience and the technical feasibility of this approach, experiments were conducted wi...

Grounding Conversational Robots on Vision Through Dense Captioning and Large Language Models

Lucrezia Grassi,Zhouyang Hong,Carmine Tommaso Recchiuto,Antonio Sgorbissa,Lucrezia Grassi,Zhouyang Hong,Carmine Tommaso Recchiuto,Antonio Sgorbissa

Discussion

Related Contents