作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (11): 104-110,144. doi: 10.19678/j.issn.1000-3428.0062440

• 人工智能与模式识别 • 上一篇    下一篇

基于稀疏Transformer的雷达点云三维目标检测

韩磊1, 高永彬1, 史志才1,2   

  1. 1. 上海工程技术大学 电子电气工程学院, 上海 201600;
    2. 上海市信息安全综合管理技术研究重点实验室, 上海 200240
  • 收稿日期:2021-08-23 修回日期:2021-12-10 发布日期:2022-11-05
  • 作者简介:韩磊(1996—),男,硕士研究生,主研方向为图像与点云目标检测;高永彬(通信作者),副教授;史志才,教授。
  • 基金资助:
    上海市信息安全综合管理技术研究重点实验室开放项目(AGK2019004)。

Three-Dimensional Object Detection of Radar Point Cloud Based on Sparse Transformer

HAN Lei1, GAO Yongbin1, SHI Zhicai1,2   

  1. 1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China;
    2. Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai 200240, China
  • Received:2021-08-23 Revised:2021-12-10 Published:2022-11-05

摘要: 随着计算机视觉技术的发展,基于点云的三维目标检测算法被广泛应用于自动驾驶、机器人控制等领域。针对点云稀疏条件下基于点云三维目标检测算法鲁棒性较差、检测精度低的问题,提出基于稀疏Transformer的三维目标检测算法。在注意力矩阵生成阶段,通过稀疏Transformer模块显式选择Top-t个权重元素,以保留有利于特征提取的权重元素,在降低环境噪点对鲁棒性影响的同时加快Transformer模块的运行速度。在回归阶段,将基于空间特征粗回归模块生成的边界框作为检测头模块的初始锚框,用于后续边界框的精细回归操作。设计基于体素的三维目标检测算法的损失函数,以精确地衡量类别损失、位置回归损失和方向损失。在KITTI数据集上的实验结果表明,相比PointPillars算法,该算法的平均精度均值提高3.46%,能有效提高点云三维目标的检测精度且具有较优的鲁棒性。相比原始Transformer模块,所提稀疏Transformer模块在点云图像上的平均运行速度加快了约0.54 frame/s。

关键词: 机器视觉, 三维目标检测, 稀疏Transformer, 粗回归, 损失函数

Abstract: With the development of computer vision technology, three-dimensional object detection algorithms based on point cloud are widely used in automatic driving, robot control and other application scenarios.Aiming at the problems of poor robustness and low detection accuracy of three-dimensional object detection algorithms based on point cloud, under the condition of sparse point cloud, this study proposes three-dimensional object detection algorithm based on the sparse Transformer.In the attention matrix generation stage, a sparse Transformer module is used to select the Top-t weight elements explicitly to retain the most favorable weight elements for feature extraction, reduce the impact of environmental noise on robustness, and accelerate the running speed of the Transformer module.In the regression stage, the bounding box generated by the coarse regression module based on spatial features is used as the initial anchor frame of the detection-head module for the subsequent fine regression operation of the bounding box.A loss function based on voxel for 3D object detection is proposed to accurately measure category, position regression, and direction losses.The experimental results on the KITTI dataset show that compared with the PointPillars algorithm, the average accuracy of the proposed algorithm is improved by 3.46%.It can effectively improve the detection accuracy of point cloud three- dimensional targets and has better robustness.Compared with the running speed of the original Transformer module, the average running speed of the proposed sparse Transformer module on point cloud image is improved by about 0.54 frame/s.

Key words: machine vision, three-dimensional object detection, sparse Transformer, coarse regression, loss function

中图分类号: