VG4D: Vision-Language Model Goes 4D Video Recognition
Zhichao Deng,Xiangtai Li,Xia Li,Yunhai Tong,Shen Zhao,Mengyuan Liu,Zhichao Deng,Xiangtai Li,Xia Li,Yunhai Tong,Shen Zhao,Mengyuan Liu
Understanding the real world through point cloud video is a crucial aspect of robotics and autonomous driving systems. However, prevailing methods for 4D point cloud recognition have limitations due to sensor resolution, which leads to a lack of detailed information. Recent advances have shown that Vision-Language Models (VLM) pre-trained on web-scale text-image datasets can learn fine-grained vis...