作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (2): 299-310. doi: 10.19678/j.issn.1000-3428.0070113

• 多模态与信息融合 • 上一篇    

基于4D毫米波雷达与视觉融合的三维目标检测算法

李健浪1, 吴新电1, 陈灵2, 阳波2, 唐文胜1   

  1. 1. 湖南师范大学信息科学与工程学院, 湖南 长沙 410081;
    2. 湖南师范大学工程与设计学院, 湖南 长沙 410081
  • 收稿日期:2024-07-12 修回日期:2024-08-18 发布日期:2024-11-05
  • 作者简介:李健浪(CCF学生会员),男,硕士研究生,主研方向为多传感器融合;吴新电,硕士研究生;陈灵,副教授、博士;阳波、唐文胜(通信作者),教授、博士。E-mail:tangws@hunnu.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(62072175)。

3D Object Detection Algorithm Based on 4D Millimeter-Wave Radar and Vision Fusion

LI Jianlang1, WU Xindian1, CHEN Ling2, YANG Bo2, TANG Wensheng1   

  1. 1. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, Hunan, China;
    2. College of Engineering and Design, Hunan Normal University, Changsha 410081, Hunan, China
  • Received:2024-07-12 Revised:2024-08-18 Published:2024-11-05

摘要: 针对自动驾驶场景中行人和车辆的目标识别与定位问题,提出一种四维(4D)毫米波雷达与视觉融合的CDCAM-BEV算法,以提高目标检测的精度。首先,设计雷达柱体网络,将4D雷达点云编码为伪图像,并通过正交特征变换(OFT)将单目图像转换为鸟瞰图(BEV)特征;其次,基于交叉注意力机制,设计共同信息提取模块(CICAM)和差异信息提取模块(DICAM),充分挖掘雷达和图像的公共信息和差异信息;最后,基于CICAM和DICAM设计BEV特征融合模块,实现图像信息和雷达信息在BEV空间的特征级融合。在VOD数据集上进行实验,将CDCAM-BEV算法与其他5种三维(3D)目标检测算法进行对比。实验结果表明,CDCAM-BEV在多个模式下检测性能均优于其他算法。在3D模式下,CDCAM-BEV的平均检测精度比排名第二的Part-A2高出3.65百分点;在BEV模式下,比排名第二的PointPillars高出5.04百分点;在平均方向相似度(AOS)模式下,比排名第二的Part-A2高出2.62百分点。结果显示,CDCAM-BEV在各模式下均表现出卓越性能,能够有效融合图像和4D雷达点云特征,显著提高目标检测的精度和可靠性。

关键词: 四维毫米波雷达, 鸟瞰图, 自动驾驶, 交叉注意力机制, 三维目标检测

Abstract: This study proposes a Common and Differential Cross-Attention Module-Bird's-Eye View (CDCAM-BEV) algorithm that combines 4D millimeter-wave radar and vision fusion to improve target detection accuracy for pedestrian and vehicle target recognition and localization in autonomous driving scenarios. First, a radar cylinder network is designed to encode the 4D radar point cloud into a pseudo image and convert the monocular image into a Bird's-Eye View (BEV) feature through Orthogonal Feature Transformation (OFT). Second, based on the cross-attention mechanism, a Common Information Extraction Module (CICAM) and a Differential Information Extraction Module (DICAM) are used to fully explore the common and differential information between radar and images. Finally, a BEV feature fusion module is designed based on CICAM and DICAM to achieve feature-level fusion of image and radar information in the BEV space. Experiments are conducted on the VOD dataset, and the CDCAM-BEV algorithm is compared with five other 3D object detection algorithms. The experimental results show that CDCAM-BEV achieves better detection performance in multiple modes. In the 3D mode, the average detection accuracy of CDCAM-BEV is 3.65 percentage points higher than that of the second ranked Part-A2; in the BEV mode, it is 5.04 percentage points higher than that of the second ranked PointPillars; in the Average Directional Similarity (AOS) mode, it is 2.62 percentage points higher than that of the second ranked Part-A2. These results show that CDCAM-BEV exhibits excellent performance in all modes, effectively fusing images and 4D radar point cloud features, which significantly improves the accuracy and reliability of object detection.

Key words: 4D millimeter-wave radar, Bird’s-Eye View (BEV), autonomous driving, cross attention mechanism, 3D object detection

中图分类号: