Improved RetinaNet Algorithm for Object Detection

doi:10.19678/j.issn.1000-3428.0062134

Abstract

Abstract: Based on the problems that the classical one-stage object detection algorithm RetinaNet is difficult to fully extract and fuse different stage features, while the bounding box regression is not sufficiently accurate, an improved RetinaNet algorithm for object detection is proposed.First, the algorithm adds multispectral channel attention to the feature extraction module, which incorporates more frequency components in the input features into the attention processing to capture the original rich information of the features.Thereafter, the multiscale feature fusion module is added after the feature extraction module, and the multiscale feature fusion module includes a path aggregation module and a feature fusion operation.The path-aggregation module enhances the information flow of the entire feature pyramid by building bottom-up paths and using accurate positioning signals on shallower feature layers.The feature fusion operation further enhances the fusion effect of multistage features by fusing the feature information from each stage.Finally, the Complete Intersection over Union(CIoU) loss function is introduced in the bounding box regression process.The loss function starts from three important geometric factors, namely, the overlapping area of the bounding box, the distance between the center points, and the aspect ratio to improve the convergence speed of the regression process and accuracy.The experimental results on the MS COCO and PASCAL VOC datasets show that, compared with the RetinaNet algorithm, the average accuracy of the improved RetinaNet algorithm on the two datasets is increased by 2.1 and 1.1 percentage points, especially for the MS COCO data set.For the detection of large targets, improving the detection accuracy is more significant.

Key words: deep learning, object detection, multi-spectral channel attention, multi-scale feature fusion, Complete Intersection over Union(CIoU)

摘要： 针对经典一阶段目标检测算法RetinaNet难以充分提取不同阶段特征、边界框回归不够准确等问题，提出一个面向目标检测的改进型RetinaNet算法。在特征提取模块中加入多光谱通道注意力，将输入特征中的频率分量合并到注意力处理中，从而捕获特征原有的丰富信息。将多尺度特征融合模块添加到特征提取模块，多尺度特征融合模块包括1个路径聚合模块和1个特征融合操作，路径聚合模块通过搭建自底向上的路径，利用较浅特征层上精确的定位信号增强整个特征金字塔的信息流，特征融合操作通过融合来自每个阶段的特征信息优化多阶段特征的融合效果。此外，在边界框回归过程中引入完全交并比损失函数，从边界框的重叠面积、中心点距离和长宽比这3个重要的几何因素出发，提升回归过程的收敛速度与准确性。在MS COCO数据集和PASCAL VOC数据集上的实验结果表明，与RetinaNet算法相比，改进型RetinaNet算法在2个数据集上的平均精度分别提高了2.1、1.1个百分点，尤其对于MS COCO数据集中较大目标的检测，检测精度的提升效果更加显著。

关键词: 深度学习, 目标检测, 多光谱通道注意力, 多尺度特征融合, 完全交并比

CLC Number:

TP391.41

YU Min, QU Dan, SI Nianwen. Improved RetinaNet Algorithm for Object Detection[J]. Computer Engineering, 2022, 48(8): 249-257.

于敏, 屈丹, 司念文. 改进的RetinaNet目标检测算法[J]. 计算机工程, 2022, 48(8): 249-257.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0062134

http://www.ecice06.com/EN/Y2022/V48/I8/249

Figures/Tables 14

References

[1] LIN T Y, GOYAL P, GIRSHICK R, et al.Focal loss for dense object detection[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2999-3007.
[2] REDMON J, FARHADI A.YOLO9000:better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6517-6525.
[3] ZHANG H K, CHANG H, MA B P, et al.Cascade RetinaNet:maintaining consistency for single-stage object detection[EB/OL].[2021-06-10].https://arxiv.org/abs/1907.06881.
[4] LI Y X, REN F B.Light-weight RetinaNet for object detection[EB/OL].[2021-06-10].https://arxiv.org/abs/1905.10011.
[5] SUN P Z, ZHANG R F, JIANG Y, et al.Sparse R-CNN:end-to-end object detection with learnable proposals[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:14449-14458.
[6] 吴华运, 任德均, 吕义钊, 等.基于改进的RetinaNet医药空瓶表面气泡检测[J].四川大学学报(自然科学版), 2020, 57(6):1090-1095. WU H Y, REN D J, LÜY Z, et al.Bubble detection on the surface of medical empty bottles based on improved RetinaNet[J].Journal of Sichuan University (Natural Science Edition), 2020, 57(6):1090-1095.(in Chinese)
[7] 闫建伟, 张乐伟, 赵源, 等.改进RetinaNet的刺梨果实图像识别[J].中国农机化学报, 2021, 42(3):78-83. YAN J W, ZHANG L W, ZHAO Y, et al.Image recognition of Rosa roxburghii fruit by improved RetinaNet[J].Journal of Chinese Agricultural Mechanization, 2021, 42(3):78-83.(in Chinese)
[8] LIN T Y, DOLLÁR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:936-944.
[9] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[10] QIN Z Q, ZHANG P Y, WU F, et al.FcaNet:frequency channel attention networks[EB/OL].[2021-06-10].https://arxiv.org/abs/2012.11879.
[11] RUSSAKOVSKY O, DENG J, SU H, et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision, 2015, 115(3):211-252.
[12] LIU S, QI L, QIN H F, et al.Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8759-8768.
[13] PANG J M, CHEN K, SHI J P, et al.Libra R-CNN:towards balanced learning for object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:821-830.
[14] ZHENG Z H, WANG P, LIU W, et al.Distance-IoU loss:faster and better learning for bounding box regression[EB/OL].[2021-06-10].https://arxiv.org/abs/1911.08287.
[15] LIN T Y, MAIRE M, BELONGIE S, et al.Microsoft COCO:common objects in context[C]//Proceedings of Conference on Computer Vision.Berlin, Germany:Springer, 2014:740-755.
[16] EVERINGHAM M, GOOL L, WILLIAMS C K I, et al.The pascal visual object classes challenge[J].International Journal of Computer Vision, 2010, 88(2):303-338.
[17] AHMED N, NATARAJAN T, RAO K R.Discrete cosine transform[J].IEEE Transactions on Computers, 1974, 23(1):90-93.
[18] WANG X L, GIRSHICK R, GUPTA A, et al.Non-local neural networks[EB/OL].[2021-06-10].https://arxiv.org/abs/1711.07971.
[19] LONG J, SHELHAMER E, DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:3431-3440.
[20] PASZKE A, GROSS S, CHINTALA S, et al.Automatic differentiation in Pytorch[EB/OL].[2021-06-10].https://openreview.net/forum?id=BJJsrmfCZ.
[21] TIAN Z, SHEN C H, CHEN H, et al.FCOS:fully convolutional one-stage object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:9626-9635.
[22] GUO C X, FAN B, ZHANG Q, et al.AugFPN:improving multi-scale feature learning for object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:12592-12601.
[23] CAO Y H, CHEN K, LOY C C, et al.Prime sample attention in object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:11580-11588.
[24] REN S Q, HE K M, GIRSHICK R, et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[25] HE K, GKIOXARI G, DOLLAR P, et al.Mask R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2980-2988.
[26] WANG T C, ANWER R M, CHOLAKKAL H, et al.Learning rich features at high-speed for single-shot object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:1971-1980.
[27] WANG S R, GONG Y C, XING J L, et al.RDSNet:a new deep architecture for reciprocal object detection and instance segmentation[EB/OL].[2021-06-10].https://arxiv.org/abs/1912.05070.
[28] WANG J Q, ZHANG W W, CAO Y H, et al.Side-aware boundary localization for more precise object detection[C]//Proceedings of Conference on Computer Vision.Berlin, Germany:Springer, 2020:403-419.

Please choose a citation manager

Content to export