Towards Unified Interactive Visual Grounding in The Wild

Jie Xu,Hanbo Zhang,Qingyi Si,Yifeng Li,Xuguang Lan,Tao Kong,Jie Xu,Hanbo Zhang,Qingyi Si,Yifeng Li,Xuguang Lan,Tao Kong

Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the user’s input by active information gathering. Previous approaches often rely on predefined templates to ask disambiguation questions, resulting in performance reduction in realistic interactive scenarios. In this pap...