Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

Wen-Hsuan Chu,Adam W. Harley,Pavel Tokmakov,Achal Dave,Leonidas Guibas,Katerina Fragkiadaki,Wen-Hsuan Chu,Adam W. Harley,Pavel Tokmakov,Achal Dave,Leonidas Guibas,Katerina Fragkiadaki

Object tracking is central to robot perception and scene understanding, allowing robots to parse a video stream in terms of moving objects with names. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories [1], [2]. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static ima...