Depth-Aware Vision-and-Language Navigation using Scene Query Attention Network

Sinan Tan,Mengmeng Ge,Di Guo,Huaping Liu,Fuchun Sun,Sinan Tan,Mengmeng Ge,Di Guo,Huaping Liu,Fuchun Sun

Vision-and-language navigation (VLN) has been an important task in the field of Robotics and Computer Vision. However, most existing vision-and-language navigation models only use features extracted from RGB observation as input, while robots can utilize depth sensors in the real world. Existing research has also shown that simply adding a depth stream to neural models could only provide a margina...