基于区域感知的多尺度目标检测算法

doi:10.19678/j.issn.1000-3428.0066683

摘要/Abstract

摘要：

针对目标检测网络主分支层的特征信息易丢失、不同尺度的特征表达能力不平衡等问题，提出一种基于区域感知的多尺度目标检测算法。在YOLOv5的基础上采用数据增强、改进的边框损失和非极大值抑制方法，构建1个更强健的基线模型，沿着通道方向使用全局最大池化、全局平均池化、卷积等操作设计通道信息增强模块，并分别作用于骨干网络的每个主分支层，使得各个检测头在特征融合过程中也不会丢失主分支层的关键特征，以强化模型对重点区域的感知能力。利用加权特征融合方法融合不同尺度的特征信息，平衡不同尺度的输入特征对输出特征的表达能力，进而提高模型对多尺度目标的感知能力，通过调整模型的通道和深度，设计4种不同规模的网络结构。实验结果表明，相比YOLOv5s，该算法在Pascal VOC、MS COCO、Global Wheat、Wider Face、Motor Defect 5个数据集上的平均精度均值分别提高5.48、3.00、1.94、0.70和1.95个百分点。同时，该算法的平均精度均值最高为50.7%，分别比YOLOv4和Dynamic Head的最大模型提高7.2和3.0个百分点。

关键词: 目标检测, 增强基线模型, 通道信息增强, 加权特征融合, 多尺度目标

Abstract:

A multi-scale object detection algorithm based on regional perception is proposed to address the feature information loss in the main branch layer and imbalanced feature expression capabilities at different scales of the object detection network. A robust baseline model based on YOLOv5 is constructed using data augmentation, improved border loss, and Non-Maximum Suppression(NMS) methods. Channel Information Enhancement Modules(CIEM) are designed along the channel direction using operations such as Global Maximum Pooling(GMP), Global Average Pooling(GAP), and convolution. These modules are applied to each main branch layer of the backbone network. This ensures that each detection head does not forget the key features of the main branch layer during the feature fusion process, thereby enhancing the model's perception of the key areas. A Weighted Feature Fusion Method(WFFM) is used to fuse feature information from different scales, which balances the expression ability of input features to output features and improves the model's perception of multi-scale objects. Further, by adjusting the channel and depth of the model, four different scale network structures are designed. Experimental results on five datasets—Pascal VOC, MS COCO, Global Wheat, Wider Face, Motor Defect—show that, compared with YOLOv5s, the proposed algorithm increases the detection accuracy by 5.48, 3.00, 1.94, 0.70, and 1.95 percentage points, respectively. Moreover, the proposed algorithm has an average accuracy of up to 50.7%, which is 7.2 and 3.0 percentage points higher than the maximum models of YOLOv4 and Dynamic Head, respectively.

Key words: object detection, enhancement baseline model, channel information enhancement, weighted feature fusion, multi-scale object

黄路, 李泽平, 杨文帮, 赵勇, 张嫡. 基于区域感知的多尺度目标检测算法[J]. 计算机工程, 2023, 49(12): 178-185.

Lu HUANG, Zeping LI, Wenbang YANG, Yong ZHAO, Di ZHANG. Multi-Scale Object Detection Algorithm Based on Regional Perception[J]. Computer Engineering, 2023, 49(12): 178-185.

http://www.ecice06.com/CN/Y2023/V49/I12/178

图/表 9

图1 CW-YOLO网络结构

Fig.1 Structure of CW-YOLO network

图2 普通NMS和Alpha-DIoU-NMS的可视化检测效果

Fig.2 The visual detection effect of ordinary NMS and Alpha-DIoU-NMS

图3 添加各个模块后的热力图

Fig.3 Heat map after adding each module

参考文献 35

1	罗会兰, 陈鸿坤. 基于深度学习的目标检测研究综述. 电子学报, 2020, 48(6): 1230- 1239. URL
	LUO H L, CHEN H K. Survey of object detection based on deep learning. Acta Electronica Sinica, 2020, 48(6): 1230- 1239. URL
2	钱坤, 李晨瑄, 陈美杉, 等. 基于YOLOv5的舰船目标及关键部位检测算法. 系统工程与电子技术, 2022, 44(6): 1823- 1832. URL
	QIAN K, LI C X, CHEN M S, et al. Ship target and key parts detection algorithm based on YOLOv5. Systems Engineering and Electronics, 2022, 44(6): 1823- 1832. URL
3	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2014: 580-587.
4	UIJLINGS J R R, SANDE K E A, GEVERS T, et al. Selective search for object recognition. International Journal of Computer Vision, 2013, 104(2): 154- 171. doi: 10.1007/s11263-013-0620-5
5	GIRSHICK R. Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2016: 1440-1448.
6	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
7	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37.
8	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 779-788.
9	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 6517-6525.
10	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-12-01]. https://arxiv.org/pdf/1804.02767.pdf.
11	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-12-01]. https://arxiv.org/abs/2004.10934.
12	GE Z, LIU S T, WANG F, et al. YOLOx: exceeding YOLO series in 2021[EB/OL]. [2022-12-01]. https://arxiv.org/pdf/2107.08430.pdf.
13	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 2999-3007.
14	郭磊, 王邱龙, 薛伟, 等. 基于改进YOLOv5的小目标检测算法. 电子科技大学学报, 2022, 51(2): 251- 258. URL
	GUO L, WANG Q L, XUE W, et al. A small object detection algorithm based on improved YOLOv5. Journal of University of Electronic Science and Technology of China, 2022, 51(2): 251- 258. URL
15	袁帅, 王康, 单义, 等. 基于多分支并行空洞卷积的多尺度目标检测算法. 计算机辅助设计与图形学学报, 2021, 33(6): 864- 872. URL
	YUAN S, WANG K, SHAN Y, et al. Multi-scale object detection method based on multi-branch parallel dilated convolution. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(6): 864- 872. URL
16	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 936-944.
17	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
18	LIU S T, HUANG D, WANG Y H. Learning spatial fusion for single-shot object detection[EB/OL]. [2022-12-01]. https://arxiv.org/pdf/1911.09516v1.pdf.
19	谢星星, 程塨, 姚艳清, 等. 动态特征融合的遥感图像目标检测. 计算机学报, 2022, 45(4): 735- 747. URL
	XIE X X, CHENG G, YAO Y Q, et al. Dynamic feature fusion for object detection in remote sensing images. Chinese Journal of Computers, 2022, 45(4): 735- 747. URL
20	侯志强, 郭浩, 马素刚, 等. 基于双分支特征融合的无锚框目标检测算法. 电子与信息学报, 2022, 44(6): 2175- 2183. URL
	HOU Z Q, GUO H, MA S G, et al. Anchor-free object detection algorithm based on double branch feature fusion. Journal of Electronics & Information Technology, 2022, 44(6): 2175- 2183. URL
21	JOCHER G, NISHIMURA K, MINEEVA T, et al. YOLOv5 [EB/OL]. [2022-12-01]. https://github.com/ultralytics/yolov5/releases/tag/v1.0.
22	EVERINGHAM M, ALI ESLAMI S M, GOOL L, et al. The pascal visual object classes challenge: a retrospective. International Journal of Computer Vision, 2015, 111(1): 98- 136.
23	CHEN X L, FANG H, LIN T Y, et al. Microsoft COCO captions: data collection and evaluation server[EB/OL]. [2022-12-01]. http://de.arxiv.org/pdf/1504.00325.
24	DAVID E, OGIDI F, GUO W, et al. Global Wheat challenge 2020: analysis of the competition design and winning models [EB/OL]. [2022-12-01]. https://arxiv.org/ftp/arxiv/papers/2105/2105.06182.pdf.
25	YANG S, LUO P, LOY C C, et al. Wider Face: a face detection benchmark[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 5525-5533.
26	WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of Conference on Computer Vision and Pattern Recognition Workshops. Washington D. C., USA: IEEE Press, 2020: 1571-1580.
27	ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. [2022-12-01]. https://arxiv.org/pdf/1710.09412.pdf.
28	HE J B, ERFANI S, MA X J, et al. Alpha-IoU: a family of power intersection over union losses for bounding box regression[EB/OL]. [2022-12-01]. https://arxiv.org/abs/2110.13675.
29	ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence. [S. l. ]: AAAI Press, 2020: 12993-13000.
30	欧阳继红, 王梓明, 刘思光. 改进多尺度特征的YOLO_v4目标检测方法. 吉林大学学报(理学版), 2022, 60(6): 1349- 1355. URL
	OUYANG J H, WANG Z M, LIU S G. YOLO_v4 object detection method with improved multi-scale features. Journal of Jilin University(Science Edition), 2022, 60(6): 1349- 1355. URL
31	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
32	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 10778-10787.
33	HUANG X, WANG X X, LV W Y, et al. PP-YOLOv2: a practical object detector[EB/OL]. [2022-12-01]. https://arxiv.org/pdf/2104.10419v1.pdf.
34	DAI X Y, CHEN Y P, XIAO B, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 7369-7378.
35	LI S, HE C H, LI R H, et al. A dual weighting label assignment scheme for object detection[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 9377-9386.

[1]	李嘉新, 侯进, 盛博莹, 周宇航. 基于改进YOLOv5的遥感小目标检测网络[J]. 计算机工程, 2023, 49(9): 256-264.
[2]	龙玉江, 卫薇, 舒彧, 张正刚, 王道累, 李峰. 基于自适应关键点的破损旋转绝缘子检测方法[J]. 计算机工程, 2023, 49(9): 272-278.
[3]	徐春波, 闫娟, 杨慧斌, 王博, 吴晗. 基于目标检测和语义分割的视觉SLAM算法[J]. 计算机工程, 2023, 49(8): 199-206, 214.
[4]	宋志娜, 李莎, 杨建明, 徐川. 基于特征与区域定位增强的遥感舰船目标检测[J]. 计算机工程, 2023, 49(8): 257-264.
[5]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[6]	李强龙, 周新文, 位梦恩, 甘阳洲. 基于条形池化和注意力机制的街道场景红外目标检测算法[J]. 计算机工程, 2023, 49(8): 310-320.
[7]	闫兴亚, 匡娅茜, 白光睿, 李月. 基于深度学习的学生课堂行为识别方法[J]. 计算机工程, 2023, 49(7): 251-258.
[8]	聂志勇, 阴宇薇, 汤佳欣, 涂志刚. 一种基于边界框关键点距离的框回归算法[J]. 计算机工程, 2023, 49(7): 65-75.
[9]	李军侠, 王星驰, 殷梓, 石德硕. 边缘深度挖掘的弱监督显著性目标检测[J]. 计算机工程, 2023, 49(7): 169-178.
[10]	吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.
[11]	齐咏生, 杜晓旭, 朱俊峰, 高胜利, 刘利强. 基于增强型轻量深度网络的牧区牲畜高效检测[J]. 计算机工程, 2023, 49(7): 278-287.
[12]	谌雨章, 黄逸姿, 张钧涵. 基于多速率空洞卷积的多尺度水下小目标检测[J]. 计算机工程, 2023, 49(6): 257-264.
[13]	罗华峰, 沈奕菲, 阮黎翔, 杜奇伟, 郑翔, 陈智麒, 张胜. 边缘环境下面向实时目标检测的帧卸载调度算法[J]. 计算机工程, 2023, 49(5): 295-301,309.
[14]	王璐璐, 陈东方, 王晓峰. 一种基于锚框质量分布的动态标签分配策略[J]. 计算机工程, 2023, 49(4): 85-91,100.
[15]	宋鹏鹏, 龚声蓉, 钟珊, 周立凡, 凤黄浩. 基于双注意力擦除和注意力信息聚合的弱监督目标检测[J]. 计算机工程, 2023, 49(3): 113-120,127.

选择文件类型/文献管理软件名称

选择包含的内容