LSSAttn: Towards Dense and Accurate View Transformation for Multi-modal 3D Object Detection

Qi Jiang,Hao Sun,Qi Jiang,Hao Sun

Fusing the camera and LiDAR information in the unified BEV representation serves as the elegant paradigm for the 3D detection tasks. Current multi-modal fusion methods in BEV can be categorized into LSS-based and Transformer-based in terms of their view transformation. The former leverages inaccurate depth prediction and massive pseudo points for perspective-to-BEV transformation while the latter ...