Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

Chuang Gan,Yiwei Zhang,Jiajun Wu,Boqing Gong,Joshua B. Tenenbaum,Chuang Gan,Yiwei Zhang,Jiajun Wu,Boqing Gong,Joshua B. Tenenbaum

A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given ...