Audio-Visual Grounding Referring Expression for Robotic Manipulation
Yefei Wang,Kaili Wang,Yi Wang,Di Guo,Huaping Liu,Fuchun Sun,Yefei Wang,Kaili Wang,Yi Wang,Di Guo,Huaping Liu,Fuchun Sun
Referring expressions are commonly used when referring to a specific target in people's daily dialogue. In this paper, we develop a novel task of audio-visual grounding referring expression for robotic manipulation. The robot leverages both the audio and visual information to understand the referring expression in the given manipulation instruction and the corresponding manipulations are implement...