RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

Pierre Sermanet,Tianli Ding,Jeffrey Zhao,Fei Xia,Debidatta Dwibedi,Keerthana Gopalakrishnan,Christine Chan,Gabriel Dulac-Arnold,Sharath Maddineni,Nikhil J Joshi,Pete Florence,Wei Han,Robert Baruch,Yao Lu,Suvir Mirchandani,Peng Xu,Pannag Sanketi,Karol Hausman,Izhak Shafran,Brian Ichter,Yuan Cao,Pierre Sermanet,Tianli Ding,Jeffrey Zhao,Fei Xia,Debidatta Dwibedi,Keerthana Gopalakrishnan,Christine Chan,Gabriel Dulac-Arnold,Sharath Maddineni,Nikhil J Joshi,Pete Florence,Wei Han,Robert Baruch,Yao Lu,Suvir Mirchandani,Peng Xu,Pannag Sanketi,Karol Hausman,Izhak Shafran,Brian Ichter,Yuan Cao

We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple embodiments (robot, human,...