作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (11): 246-257. doi: 10.19678/j.issn.1000-3428.0068696

• 图形图像处理 • 上一篇    下一篇

基于鸟瞰图融合的多级旋转等变目标检测网络

刘宏纬1,3, 邵东恒2,3, 杨剑4,*(), 魏宪2,3, 李科4, 游雄4   

  1. 1. 福州大学先进制造学院, 福建 泉州 362000
    2. 中国科学院福建物质结构研究所, 福建 福州 350108
    3. 中国科学院海西研究院泉州装备制造研究中心, 福建 泉州 362000
    4. 战略支援部队信息工程大学地理空间信息学院, 河南 郑州 450001
  • 收稿日期:2023-10-25 出版日期:2024-11-15 发布日期:2024-04-01
  • 通讯作者: 杨剑
  • 基金资助:
    国家自然科学基金重点项目(42130112); 国家自然科学基金面上项目(42371479); KartoBit研究网络开放课题(KRN2201CA)

Multi-Level Rotational Equivariant Object Detection Network Based on BEV Fusion

LIU Hongwei1,3, SHAO Dongheng2,3, YANG Jian4,*(), WEI Xian2,3, LI Ke4, YOU Xiong4   

  1. 1. School of Advanced Manufacturing, Fuzhou University, Quanzhou 362000, Fujian, China
    2. Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou 350108, Fujian, China
    3. Quanzhou Institute of Equipment Manufacturing Haixi Institutes, Chinese Academy of Sciences, Quanzhou 362000, Fujian, China
    4. School of Geospatial Information, Strategic Support Force Information Engineering University, Zhengzhou 450001, Henan, China
  • Received:2023-10-25 Online:2024-11-15 Published:2024-04-01
  • Contact: YANG Jian

摘要:

随着自动驾驶系统的发展, 针对道路场景下的3D目标检测受到研究人员的广泛关注。然而, 大多数基于单一传感器或者多传感器融合的目标检测方法未考虑实际道路场景中车辆旋转, 使得捕获的场景同步旋转, 从而导致目标检测性能降低。针对这类问题, 提出一种基于多传感器融合的多级全局旋转等变目标检测网络架构, 以缓解场景旋转造成的目标检测困难, 从而提高目标检测性能。首先, 对体素内部进行各点之间距离编码, 增强局部点云几何信息, 并提取体素的全局旋转等变特征; 其次, 引入图像的语义信息并提取全局旋转等变特征, 进一步提高网络性能; 最后, 将具有旋转等变性的点云和图像信息在鸟瞰图上进行融合, 并嵌入群等变网络提取融合鸟瞰图级全局旋转等变特征。实验结果表明, 该网络架构在nuScenes验证集上达到了68.7%的平均精度均值(mAP)和71.7的nuScenes检测分数(NDS), 以及平均角度误差均值(mAOE)降低到0.288, 相比主流的目标检测方法, 其实现了网络架构本身的旋转等变性并在性能上得到了提升, 此外, 各个组件对于整体网络架构的目标检测性能提升都起到了重要作用。

关键词: 多传感器融合, 体素, 鸟瞰图, 旋转等变, 3D目标检测

Abstract:

With the development of autonomous driving systems, the detection of three-dimensional objects in road scenes has garnered widespread attention. However, most single-sensor or multi-sensor fusion-based object detection methods do not consider the synchronized rotation of the captured scene owing to vehicle movement in real road scenes, which impairs object detection performance. To address such problems, this study proposes a multi-level global rotational equivariant object detection network framework based on multi-sensor fusion to alleviate the difficulty of object detection caused by scene rotation and thereby improve object detection performance. First, the interior of the voxels is encoded by the distance between each point to enhance the local point cloud geometric information and extract the global rotational equivariant features of the voxels. Second, the semantic information of the image is introduced, and global rotational equivariant features are extracted to further improve the network performance. Finally, the point cloud and image information, all with rotational equivariants, are fused on a Bird's-Eye View (BEV) and embedded in a group equivariant network to extract the global rotational equivariant features on the fused BEV level. Experimental results on the nuScenes validation set show that the network architecture achieves a mean Average Precision (mAP) of 68.7% and a nuScenes Detection Score (NDS) of 71.7. Moreover, the mean Average Orientation Error (mAOE) decreases to 0.288. Compared with mainstream object detection methods, the proposed method realizes the rotational equivariance of the network architecture and improves performance. In addition, each component plays an important role in improving the object detection performance of the overall network architecture.

Key words: multi-sensor fusion, voxels, Bird's-Eye View(BEV), rotational equivariance, 3D object detection