Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation

Jared Mejia,Victoria Dean,Tess Hellebrekers,Abhinav Gupta,Jared Mejia,Victoria Dean,Tess Hellebrekers,Abhinav Gupta

Although pre-training on a large amount of data is beneficial for robot learning, current paradigms only perform large-scale pretraining for visual representations, whereas representations for other modalities are trained from scratch. In contrast to the abundance of visual data, it is unclear what relevant internet-scale data may be used for pretraining other modalities such as tactile sensing. S...