基于4D毫米波雷达与视觉融合的三维目标检测算法

doi:10.19678/j.issn.1000-3428.0070113

摘要/Abstract

摘要： 针对自动驾驶场景中的行人和车辆目标识别与定位问题，提出了一种4D毫米波雷达与视觉融合的算法——CDCAM-BEV，以提高目标检测精度。其基本思想是：首先，设计雷达柱体网络，将4D雷达点云编码为伪图像，并通过正交特征变换将单目图像转换为鸟瞰图（BEV）特征；其次，基于交叉注意力机制，设计共同信息提取模块（CICAM）和差异信息提取模块（DICAM），充分挖掘雷达和图像的公共信息和差异信息；最后，基于CICAM和DICAM模块设计鸟瞰图特征融合模块，实现图像信息和雷达信息在BEV空间的特征级融合。在具有挑战性的VOD数据集上验证了所提算法，并与其他五种三维目标检测算法进行对比。实验结果显示，CDCAM-BEV在多个模式下的检测性能均优于其他算法。在三维模式下，CDCAM-BEV的平均检测精度比排名第二的Part-A²高出3.65%；在BEV模式下，比排名第二的Pointpillars高出5.04%；在AOS模式下，比排名第二的Part-A²高出2.62%。这些结果表明，CDCAM-BEV在各模式下均表现出卓越性能，能够有效融合图像和4D雷达点云特征，显著提高目标检测精度和可靠性。

Abstract: This paper proposes a 4D millimeter-wave radar and vision fusion algorithm—CDCAM-BEV—for pedestrian and vehicle recognition and localization in autonomous driving scenarios, aiming to improve detection accuracy. The basic idea is as follows: First, a radar cylindrical network is designed to encode 4D radar point clouds into pseudo-images, and the monocular image is transformed into bird's-eye view (BEV) features through an orthogonal feature transformation. Then, a cross-attention mechanism-based common information extraction module (CICAM) and differential information extraction module (DICAM) are designed to fully exploit the common and differential information between radar and image. Finally, a BEV feature fusion module is designed based on the CICAM and DICAM modules to achieve feature-level fusion of image and radar information in the BEV space. The proposed algorithm is validated on the challenging VOD dataset and compared with five other 3D object detection algorithms. Experimental results show that CDCAM-BEV outperforms other algorithms in multiple modes. In the 3D mode, the average detection accuracy of CDCAM-BEV is 3.65% higher than the second-ranked Part-A²; in the BEV mode, it is 5.04% higher than the second-ranked Pointpillars; in the AOS mode, it is 2.62% higher than the second-ranked Part-A². These results indicate that CDCAM-BEV demonstrates superior performance across all modes, effectively fusing image and 4D radar point cloud features, significantly improving detection accuracy and reliability.

李健浪, 吴新电, 陈灵, 阳波, 唐文胜. 基于4D毫米波雷达与视觉融合的三维目标检测算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0070113.

Li Jianlang, Wu Xindian, Chen Ling, Yang Bo, Tang Wensheng. 3D Object Detection Based on 4D Millimeter-Wave Radar and Vision Fusion[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0070113.

参考文献

[1] 王若萱, 吴建平, 徐辉. 自动驾驶汽车感知系统仿真的研究及应用综述[J]. 系统仿真学报, 2022, 34(12): 2507.
[2] Ayala R, Mohd T K. Sensors in autonomous vehicles: A survey[J]. Journal of Autonomous Vehicles and Systems, 2021, 1(3): 031003.
[3] Tan R T. Visibility in bad weather from a single image[C]//2008 IEEE conference on computer vision and pattern recognition. IEEE, 2008: 1-8.
[4] 陈熙源,戈明明,姚志婷,等. 雨雪天气下的激光雷达滤波算法研究 [J]. 仪器仪表学报, 2023, 44 (07): 172-181. DOI:10.19650/j.cnki.cjsi.J2311227.
[5] Sun R, Suzuki K, Owada Y, et al. A millimeter-wave automotive radar with high angular resolution for identification of closely spaced on-road obstacles[J]. Scientific reports, 2023, 13(1): 3233.
[6] 任珈民, 宫宁生, 韩镇阳. 基于 YOLOv3 与卡尔曼滤波的多目标跟踪算法[J]. 计算机应用与软件, 2020, 37(5): 169-176.
[7] Bai J, Li S, Huang L, et al. Robust detection and tracking method for moving object based on radar and camera data fusion[J]. IEEE Sensors Journal, 2021, 21(9): 10761-10774.
[8] Bansal K, Rungta K, Bharadia D. Radsegnet: A reliable approach to radar camera fusion[J]. arxiv preprint arxiv:2208.03849, 2022.
[9] Nabati R, Qi H. Centerfusion: Center-based radar and camera fusion for 3d object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 1527-1536.
[10] Tan B, Ma Z, Zhu X, et al. 3-D object detection for multiframe 4-D automotive millimeter-wave radar point cloud[J]. IEEE Sensors Journal, 2022, 23(11): 11125-11138.
[11] Bai J, Li S, Tan B, et al. Traffic participants classification based on 3D radio detection and ranging point clouds[J]. IET Radar, Sonar & Navigation, 2022, 16(2): 278-290.
[12] Meyer M, Kuschk G. Automotive radar dataset for deep learning based 3d object detection[C]//2019 16th european radar conference (EuRAD). IEEE, 2019: 129-132.
[13] Palffy A, Pool E, Baratam S, et al. Multi-class road user detection with 3+ 1D radar in the View-of-Delft dataset[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 4961-4968.
[14] Paek D H, Kong S H, Wijaya K T. K-radar: 4d radar object detection for autonomous driving in various weather conditions[J]. Advances in Neural Information Processing Systems, 2022, 35: 3819-3829.
[15] Zheng L, Ma Z, Zhu X, et al. Tj4dradset: A 4d radar dataset for autonomous driving[C]//2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022: 493-498.
[16] Liu Z, Tang H, Amini A, et al. Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation[C]//2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2023: 2774-2781.
[17] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arxiv preprint arxiv:1409.1556, 2014.
[18] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[19] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arxiv preprint arxiv:1804.02767, 2018.
[20] Chadwick S, Maddern W, Newman P. Distant vehicle detection using radar and vision[C]//2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019: 8311-8317.
[21] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37.
[22] Chang S, Zhang Y, Zhang F, et al. Spatial attention fusion for obstacle detection using mmwave radar and vision sensor[J]. Sensors, 2020, 20(4): 956.
[23] Kim Y, Choi J W, Kum D. Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020: 10857-10864.
[24] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.
[25] Ren M, Pokrovsky A, Yang B, et al. Sbnet: Sparse blocks network for fast inference[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018: 8711-8720.
[26] Kim Y, Kim S, Choi J W, et al. Craft: Camera-radar 3d object detection with spatio-contextual fusion transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(1): 1160-1168.
[27] Wu Z, Chen G, Gan Y, et al. Mvfusion: Multi-view 3d object detection with semantic-aligned radar and camera fusion[C]//2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023: 2766-2773.
[28] Zheng L, Li S, Tan B, et al. Rcfusion: Fusing 4-d radar and camera with bird’s-eye view features for 3-d object detection[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 1-14.
[29] Thomas Roddick, Alex Kendall, Roberto Cipolla, “Orthographic Feature Transform for Monocular 3D Object Detection”,2018,arXiv:1811.08188.
[30] Lang A H, Vora S, Caesar H, et al. Pointpillars: Fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 12697-12705.
[31] Jian L, Xiong S, Yan H, et al. Rethinking Cross-Attention for Infrared and Visible Image Fusion[J]. arXiv preprint arXiv:2401.11675, 2024.
[32] Yan Y, Mao Y, Li B. Second: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.

选择文件类型/文献管理软件名称

选择包含的内容