融合注意力与特征金字塔的小尺度目标检测算法

doi:10.19678/j.issn.1000-3428.0066724

摘要/Abstract

摘要：

针对Faster R-CNN算法对于小尺寸目标以及遮挡或截断物体检测能力不足的问题，提出一种融合CBAM注意力机制和特征金字塔结构的改进Faster R-CNN算法。为重点聚焦特征图像局部高效信息，在特征提取网络中融入CBAM机制，减少无效目标的干扰，提升面对遮挡或截断物体的检测能力。引入特征金字塔网络结构，联结高层与底层特征数据，获得高分辨率、强语义数据，从而增强小目标物体的检测效果。为缓解梯度消失现象以及减少超参数规模，使用表达能力较强的倒残差VS-ResNet网络替换VGG16网络，VS-ResNet网络在原有ResNet 50基础上修改了部分层次结构，加入辅助分类器，设计倒残差和组卷积方式，使激活函数信息在高维环境中完整保留，提高检测准确率。采用重置候选框分值计算方法弥补非极大值抑制算法误消除重叠检测框的缺陷。实验结果表明，相比VGG16，VS-ResNet在CIFAR-10数据集上的正确率提高2.97个百分点，该算法在Pascal VOC 2012数据集上的目标检测mAP值为76.2%，比原始Faster R-CNN算法的mAP值提高了13.9个百分点。

关键词: 深度学习, 注意力机制, 特征金字塔, 小目标检测, 截断物体检测

Abstract:

A modified Faster R-CNN algorithm is proposed to address the problem of poor detection ability for small-scale objects and occluded or truncated objects, combining the CBAM mechanism and feature pyramid structure. To focus on the efficient use of local information in feature images, the CBAM mechanism is integrated into the feature extraction network to reduce the interference of invalid targets and improve the detection ability, notwithstanding occluded or truncated objects. This introduces a Feature Pyramid Network(FPN) structure to connect high- and low-level feature data, obtaining high-resolution and strong semantic data, thereby enhancing the detection effect of small objects. To alleviate the phenomenon of gradient vanishing and reduce the scale of hyperparameters, the commonly used VGG16 network is replaced with a strong expressive ability of the inverse residual VS-ResNet network. VS-ResNet modifies some hierarchical structures based on the original ResNet 50, adds auxiliary classifiers, designs inverse residual and group convolution methods, such that the activation function information is fully preserved in high-dimensional environments, and improves detection accuracy. The reset candidate box score calculation method is used to compensate for the defect of the Non-Maximum Suppression(NMS) algorithm in mistakenly eliminating overlapping detection boxes. The experimental results demonstrate that compared to VGG16, VS-ResNet has a 2.97 percentage points improvement in accuracy on the CIFAR-10 dataset. The target detection mAP value of the proposed algorithm on the Pascal VOC 2012 dataset is 76.2%, which is 13.9 percentage points higher than that of the original Faster R-CNN algorithm.

Key words: deep learning, attention mechanism, feature pyramid, small object detection, truncated object detection

圣文顺, 余熊峰, 林佳燕, 陈欣. 融合注意力与特征金字塔的小尺度目标检测算法[J]. 计算机工程, 2024, 50(1): 242-250.

Wenshun SHENG, Xiongfeng YU, Jiayan LIN, Xin CHEN. Small-Scale Object Detection Algorithm Integrating Attention and Feature Pyramids[J]. Computer Engineering, 2024, 50(1): 242-250.

http://www.ecice06.com/CN/Y2024/V50/I1/242

图/表 17

图1 Faster R-CNN算法框架

Fig.1 Framework of Faster R-CNN algorithm

图2 区域建议网络结构

Fig.2 Structure of region proposal network

图3 卷积块结构

Fig.3 Structure of convolution block

图4 辅助分类器示意图

Fig.4 Schematic diagram of auxiliary classifier

图5 CBAM注意力机制示意图

Fig.5 Schematic diagram of CBAM attention mechanism

图6 特征金字塔网络结构

Fig.6 Structure of feature pyramid network

图7 CF-RCNN整体结构

Fig.7 Overall structure of CF-RCNN

图8 VS-ResNet、ResNet 50、VGG16错误率对比

Fig.8 Comparison of error rates among VS-ResNet, ResNet 50, VGG16

图9 截断目标检测结果对比

Fig.9 Comparison of truncated object detection results

图10 算法优化前后检测结果对比

Fig.10 Comparison of detection results before and after algorithm optimization

参考文献 35

1	张珂, 冯晓晗, 郭玉荣, 等. 图像分类的深度卷积神经网络模型综述. 中国图象图形学报, 2021, 26(10): 2305- 2325.
	ZHANG K, FENG X H, GUO Y R, et al. Overview of deep convolutional neural networks for image classification. Journal of Image and Graphics, 2021, 26(10): 2305- 2325.
2	XU P F, LI F, WANG H P. A novel concatenate feature fusion RCNN architecture for sEMG-based hand gesture recognition. PLoS One, 2022, 17, 262810.
3	JIANG L, CHEN J A, TODO H, et al. Application of a fast RCNN based on upper and lower layers in face recognition. Computational Intelligence and Neuroscience, 2021, 2021, 1- 12.
4	DELWICHE S, BAEK I, KIM M S. Does spatial region of interest(ROI)matter in multispectral and hyperspectral imaging of segmented wheat kernels?. Biosystems Engineering, 2021, 212, 106- 114. doi: 10.1016/j.biosystemseng.2021.10.003
5	赵薇, 赵雪妮, 康凯, 等. 基于LDA_SVM的小麦质地检测方法研究. 中国粮油学报, 2023, 38(1): 146- 152.
	ZHAO W, ZHAO X N, KANG K, et al. Research on wheat texture detection method based on LDA_SVM. Chinese Journal of Cereals and Oils, 2023, 38(1): 146- 152.
6	翁昕. 目标检测网络SSD的区域候选框的设置问题研究[D]. 西安: 西安电子科技大学, 2017.
	WENG X. Research on the setting of regional candidate frame of SSD in target detection network[D]. Xi'an: Xidian University, 2017. (in Chinese)
7	SEONG J H, LEE S H, KIM W Y, et al. High-precision RTT-based indoor positioning system using RCDN and RPN. Sensors, 2021, 21(11): 3701. doi: 10.3390/s21113701
8	JI X Y, YAN Q Y, HUANG D, et al. Filtered selective search and evenly distributed convolutional neural networks for casting defects recognition. Journal of Materials Processing Technology, 2021, 292, 117064. doi: 10.1016/j.jmatprotec.2021.117064
9	LUO Q, LIU Z D. Research on face local attribute detection method based on improved SSD network structure. Advances in Multimedia, 2022,(3): 1- 11.
10	CATELANI M, CIANI L, GALAR D, et al. Risk assessment of a wind turbine: a new FMECA-based tool with RPN threshold estimation. IEEE Access, 2020, 8, 20181- 20190. doi: 10.1109/ACCESS.2020.2968812
11	QI J H, ZHANG J D, MENG Q Y, et al. Detection of auxiliary equipment in engine room based on improved SSD[C]//Proceedings of the 3rd International Conference on Modeling, Simulation, Optimization and Algorithm. Sanya, China: [s. n. ], 2022: 1-10.
12	ROCHA D A, FERREIRA F M F, PEIXOTO Z M A. Diabetic retinopathy classification using VGG16 neural network. Research on Biomedical Engineering, 2022, 38(2): 761- 772. doi: 10.1007/s42600-022-00200-8
13	FU H X, SONG G Q, WANG Y C. Improved YOLOv4 marine target detection combined with CBAM. Symmetry, 2021, 13(4): 623. doi: 10.3390/sym13040623
14	LI Y C, ZHOU S L, CHEN H. Attention-based fusion factor in FPN for object detection. Applied Intelligence, 2022, 52(13): 15547- 15556. doi: 10.1007/s10489-022-03220-0
15	WALIA I S, KUMAR D, SHARMA K, et al. An integrated approach for monitoring social distancing and face mask detection using stacked ResNet-50 and YOLOv5. Electronics, 2021, 10(23): 2996. doi: 10.3390/electronics10232996
16	WANG A L, WANG W Y, ZHOU H M, et al. Network intrusion detection algorithm combined with group convolution network and snapshot ensemble. Symmetry, 2021, 13(10): 1814. doi: 10.3390/sym13101814
17	屈景怡, 刘畅. 基于轻量化网络MobileNetV2的航班延误预测模型. 信号处理, 2022, 38(5): 973- 982.
	QU J Y, LIU C. Flight delay prediction model based on lightweight network MobileNetV2. Journal of Signal Processing, 2022, 38(5): 973- 982.
18	LIU M, CAI Z Q, CHEN J S. Adaptive two-layer ReLU neural network: I. best least-squares approximation[EB/OL]. [2022-12-08]. https://arxiv.org/abs/2107.08935.
19	PAZHANI A A J, VASANTHANAYAKI C. Object detection in satellite images by Faster R-CNN incorporated with enhanced ROI pooling(FrRNet-ERoI)framework. Earth Science Informatics, 2022, 15(1): 553- 561. doi: 10.1007/s12145-021-00746-8
20	SZOSTAK D, WŁODARCZYK A, WALKOWIAK K. Machine learning classification and regression approaches for optical network traffic prediction. Electronics, 2021, 10(13): 1578. doi: 10.3390/electronics10131578
21	孟月波, 石德旺, 刘光辉, 等. 多维度卷积融合的密集不规则文本检测. 光学精密工程, 2021, 29(9): 2210- 2221.
	MENG Y B, SHI D W, LIU G H, et al. Dense irregular text detection based on multi-dimensional convolution fusion. Optics and Precision Engineering, 2021, 29(9): 2210- 2221.
22	ZHU X L, HE Z L, ZHAO L, et al. A cascade attention based facial expression recognition network by fusing multiscale spatio-temporal features. Sensors, 2022, 22(4): 1- 10. doi: 10.1109/JSEN.2022.3147718
23	朱伟, 马立新, 张平, 等. 基于GoogLeNet和无人机图像的水稻秧苗形态识别. 华南农业大学学报, 2022, 43(3): 99- 106.
	ZHU W, MA L X, ZHANG P, et al. Morphological recognition of rice seedlings based on GoogLeNet and UAV image. Journal of South China Agricultural University, 2022, 43(3): 99- 106.
24	CHEN Q, ZHANG W Y, ZHU K, et al. A novel trilinear deep residual network with self-adaptive Dropout method for short-term load forecasting. Expert Systems With Applications, 2021, 182, 115272. doi: 10.1016/j.eswa.2021.115272
25	LIAO Z H, FAN N, XU K. Swin Transformer assisted prior attention network for medical image segmentation. Applied Sciences, 2022, 12(9): 4735. doi: 10.3390/app12094735
26	陈勇, 刘曦, 刘焕淋. 基于特征通道和空间联合注意机制的遮挡行人检测方法. 电子与信息学报, 2020, 42(6): 1486- 1493.
	CHEN Y, LIU X, LIU H L. Occluded pedestrian detection based on joint attention mechanism of channel-wise and spatial information. Journal of Electronics & Information Technology, 2020, 42(6): 1486- 1493.
27	LU Y M, HUANG X F, HUANG Y Z, et al. Sigmoid function model for a PFM power electronic converter. IEEE Transactions on Power Electronics, 2020, 35(4): 4233- 4241. doi: 10.1109/TPEL.2019.2935632
28	FENG T, LIU J G, FANG X A, et al. A double-branch surface detection system for armatures in vibration motors with miniature volume based on ResNet-101 and FPN. Sensors, 2020, 20(8): 2360. doi: 10.3390/s20082360
29	QIN A, ZHANG Y. YOLOv3 traffic sign recognition and detection based on FPN improvement. Scientific Journal of Intelligent Research, 2022, 2(11): 1- 10.
30	LLV X Y. CIFAR-10 image classification based on convolutional neural network. Frontiers in Signal Processing, 2020, 4(4): 1- 12.
31	王晓华, 叶振兴, 王文杰, 等. 多级特征融合下的高精度语义分割方法. 西安工程大学学报, 2021, 35(5): 43- 49.
	WANG X H, YE Z X, WANG W J, et al. High precision semantic segmentation based on multi-level feature fusion. Journal of Xi'an Polytechnic University, 2021, 35(5): 43- 49.
32	陈学仕, 苏通, 漆为民. 基于改进Faster RCNN的印刷电路板瑕疵检测算法. 江汉大学学报(自然科学版), 2022, 50(1): 87- 96.
	CHEN X S, SU T, QI W M. Printed circuit board defect detection algorithm based on improved Faster RCNN. Journal of Jianghan University (Natural Science Edition), 2022, 50(1): 87- 96.
33	WU S K, YANG J R, WANG X G, et al. IoU-balanced loss functions for single-stage object detection. Pattern Recognition Letters, 2022, 156, 96- 103. doi: 10.1016/j.patrec.2022.01.021
34	唐欣. 基于深度学习的小样本水稻害虫识别方法研究[D]. 合肥: 安徽大学, 2021.
	TANG X. Research on identification method of small sample rice pests based on deep learning[D]. Hefei: Anhui University, 2021. (in Chinese)
35	JIA D Y, HE Z H, ZHANG C W, et al. Detection of cervical cancer cells in complex situation based on improved YOLOv3 network. Multimedia Tools and Applications, 2022, 81(6): 8939- 8961. doi: 10.1007/s11042-022-11954-9

[1]	祝冰艳, 陈志华, 盛斌. 基于感知增强Swin Transformer的遥感图像检测[J]. 计算机工程, 2024, 50(1): 216-223.
[2]	蒋心璐, 陈天恩, 王聪, 赵春江. 大田环境下的农业害虫图像小目标检测算法[J]. 计算机工程, 2024, 50(1): 232-241.
[3]	吴志强, 解庆, 李琳, 刘永坚. 基于多模态融合的图神经网络推荐算法[J]. 计算机工程, 2024, 50(1): 91-100.
[4]	申秀雨, 姬伟峰, 李映岐, 吴玄. 面向边缘计算的TCA1C DDoS检测模型[J]. 计算机工程, 2024, 50(1): 198-205.
[5]	白尚旺, 王梦瑶, 胡静, 陈志泊. 多区域注意力的细粒度图像分类网络[J]. 计算机工程, 2024, 50(1): 271-278.
[6]	杨瑞君, 秦晋京, 程燕. 基于生成对抗网络的自然场景低照度增强模型[J]. 计算机工程, 2024, 50(1): 279-288.
[7]	曹广硕, 黄瑞章, 陈艳平, 秦永彬. 基于多模态学习的乳腺癌生存预测研究[J]. 计算机工程, 2024, 50(1): 296-305.
[8]	徐晓峰, 黄韫栀, 徐军. 基于各向异性注意力的双分支血管分割模型[J]. 计算机工程, 2024, 50(1): 348-356.
[9]	杨静, 陆铭华, 马洁琼, 吴金平, 刘星璇. 基于交替循环神经网络的水下防御态势预测方法[J]. 计算机工程, 2023, 49(9): 69-78.
[10]	孙龙, 张荣芬, 刘宇红, 饶庭漓. 监控视角下密集人群口罩佩戴检测算法[J]. 计算机工程, 2023, 49(9): 313-320.
[11]	李嘉新, 侯进, 盛博莹, 周宇航. 基于改进YOLOv5的遥感小目标检测网络[J]. 计算机工程, 2023, 49(9): 256-264.
[12]	池亚平, 岳梓岩, 林雨衡. 基于Transformer的SM4算法工作模式识别[J]. 计算机工程, 2023, 49(9): 109-117.
[13]	苏晓东, 李世洲, 赵佳圆, 亮洪宇, 张玉荣, 徐红岩. 基于多级叠加和注意力机制的图像语义分割[J]. 计算机工程, 2023, 49(9): 265-271, 278.
[14]	林中霖, 时金桥, 王美琪, 王学宾, 王雨燕. 基于应用行为划分的Android恶意应用检测技术[J]. 计算机工程, 2023, 49(9): 125-136.
[15]	韩璐, 霍纬纲, 张永会, 刘涛. 基于多尺度特征融合与双注意力机制的多元时间序列预测[J]. 计算机工程, 2023, 49(9): 99-108.

选择文件类型/文献管理软件名称

选择包含的内容