Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

Shuo Xin,Zhen Zhang,Mengmeng Wang,Xiaojun Hou,Yaowei Guo,Xiao Kang,Liang Liu,Yong Liu,Shuo Xin,Zhen Zhang,Mengmeng Wang,Xiaojun Hou,Yaowei Guo,Xiao Kang,Liang Liu,Yong Liu

Tracking a specific person in 3D scene is gaining momentum due to its numerous applications in robotics. Currently, most 3D trackers focus on driving scenarios with neglected jitter and uncomplicated surroundings, which results in their severe degeneration in complex environments, especially on jolting robot platforms (only 20-60% success rate). To improve the accuracy, a Point-Video-based Transfo...