Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (8): 249-257. doi: 10.19678/j.issn.1000-3428.0062134

• Graphics and Image Processing • Previous Articles     Next Articles

Improved RetinaNet Algorithm for Object Detection

YU Min1,2, QU Dan2, SI Nianwen2   

  1. 1. School of Software, Zhengzhou University, Zhengzhou 450000, China;
    2. School of Information Systems Engineering, Strategic Support Force Information Engineering University, Zhengzhou 450000, China
  • Received:2021-07-20 Revised:2021-08-26 Published:2021-08-30

改进的RetinaNet目标检测算法

于敏1,2, 屈丹2, 司念文2   

  1. 1. 郑州大学 软件学院, 郑州 450000;
    2. 战略支援部队信息工程大学 信息系统工程学院, 郑州 450000
  • 作者简介:于敏(1997-),女,硕士研究生,主研方向为智能信息处理、目标检测;屈丹,教授、博士;司念文,讲师、博士。
  • 基金资助:
    国家自然科学基金(62171470,61673395)。

Abstract: Based on the problems that the classical one-stage object detection algorithm RetinaNet is difficult to fully extract and fuse different stage features, while the bounding box regression is not sufficiently accurate, an improved RetinaNet algorithm for object detection is proposed.First, the algorithm adds multispectral channel attention to the feature extraction module, which incorporates more frequency components in the input features into the attention processing to capture the original rich information of the features.Thereafter, the multiscale feature fusion module is added after the feature extraction module, and the multiscale feature fusion module includes a path aggregation module and a feature fusion operation.The path-aggregation module enhances the information flow of the entire feature pyramid by building bottom-up paths and using accurate positioning signals on shallower feature layers.The feature fusion operation further enhances the fusion effect of multistage features by fusing the feature information from each stage.Finally, the Complete Intersection over Union(CIoU) loss function is introduced in the bounding box regression process.The loss function starts from three important geometric factors, namely, the overlapping area of the bounding box, the distance between the center points, and the aspect ratio to improve the convergence speed of the regression process and accuracy.The experimental results on the MS COCO and PASCAL VOC datasets show that, compared with the RetinaNet algorithm, the average accuracy of the improved RetinaNet algorithm on the two datasets is increased by 2.1 and 1.1 percentage points, especially for the MS COCO data set.For the detection of large targets, improving the detection accuracy is more significant.

Key words: deep learning, object detection, multi-spectral channel attention, multi-scale feature fusion, Complete Intersection over Union(CIoU)

摘要: 针对经典一阶段目标检测算法RetinaNet难以充分提取不同阶段特征、边界框回归不够准确等问题,提出一个面向目标检测的改进型RetinaNet算法。在特征提取模块中加入多光谱通道注意力,将输入特征中的频率分量合并到注意力处理中,从而捕获特征原有的丰富信息。将多尺度特征融合模块添加到特征提取模块,多尺度特征融合模块包括1个路径聚合模块和1个特征融合操作,路径聚合模块通过搭建自底向上的路径,利用较浅特征层上精确的定位信号增强整个特征金字塔的信息流,特征融合操作通过融合来自每个阶段的特征信息优化多阶段特征的融合效果。此外,在边界框回归过程中引入完全交并比损失函数,从边界框的重叠面积、中心点距离和长宽比这3个重要的几何因素出发,提升回归过程的收敛速度与准确性。在MS COCO数据集和PASCAL VOC数据集上的实验结果表明,与RetinaNet算法相比,改进型RetinaNet算法在2个数据集上的平均精度分别提高了2.1、1.1个百分点,尤其对于MS COCO数据集中较大目标的检测,检测精度的提升效果更加显著。

关键词: 深度学习, 目标检测, 多光谱通道注意力, 多尺度特征融合, 完全交并比

CLC Number: