LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

Jianing Yang,Xuweiyi Chen,Shengyi Qian,Nikhil Madaan,Madhavan Iyengar,David F. Fouhey,Joyce Chai,Jianing Yang,Xuweiyi Chen,Shengyi Qian,Nikhil Madaan,Madhavan Iyengar,David F. Fouhey,Joyce Chai

3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipe...