Towards Unified Interactive Visual Grounding in The Wild
Jie Xu,Hanbo Zhang,Qingyi Si,Yifeng Li,Xuguang Lan,Tao Kong,Jie Xu,Hanbo Zhang,Qingyi Si,Yifeng Li,Xuguang Lan,Tao Kong
Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the user’s input by active information gathering. Previous approaches often rely on predefined templates to ask disambiguation questions, resulting in performance reduction in realistic interactive scenarios. In this pap...