Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning
Liqi Yan,Dongfang Liu,Yaoxian Song,Changbin Yu,Liqi Yan,Dongfang Liu,Yaoxian Song,Changbin Yu
Vision and voice are two vital keys for agents’ interaction and learning. In this paper, we present a novel indoor navigation model called Memory Vision-Voice Indoor Navigation (MVV-IN), which receives voice commands and analyzes multimodal information of visual observation in order to enhance robots’ environment understanding. We make use of single RGB images taken by a rst-view monocular camera....


