基于改进SSD算法的小目标检测

doi:10.19678/j.issn.1000-3428.0065253

摘要/Abstract

摘要：

SSD属于经典的单阶段目标检测算法，通过在不同卷积层上生成6个尺度的特征图进行预测，但由于其存在浅层特征图的非线性程度不够、语义信息缺乏等问题，且小目标所含像素少，导致小目标在经过多次卷积操作后信息丢失严重，小目标的检测准确率远低于大中尺度目标的检测准确率。提出多尺度特征与混合注意力机制融合的策略，在替换原骨干网络的基础上构建自下而上的下采样路径和自上而下的上采样路径。具体来说，下采样路径使用自注意力机制自适应地增强浅层空间特征和深层语义特征。在上采样路径中，通过融合3个尺度特征图的局部信息和全局信息，增强深层特征的语义信息，并引入空间注意力机制和坐标注意力机制以丰富待融合特征图的语义信息和位置信息，同时使用自注意力增强模块增强融合特征的表达能力。实验结果表明，当输入图像大小为512×512像素时，所提改进算法在PASCAL VOC和HRRSD数据集上的平均精度均值分别为84.6%、89.6%，与SSD算法相比分别提高了6.1、8.8个百分点。

关键词: 深度学习, 注意力机制, 小目标检测, 特征增强, 特征融合

Abstract:

SSD is a classical single-stage target detection algorithm that makes prediction by generating six scale feature maps on different convolutional layers. However, it suffers from the problems of insufficient nonlinearity and lack of semantic information in shallow feature maps, and small targets contain few pixels and lose significant information after multiple convolution operations, which leads to the detection accuracy of small targets being much lower than that of large and medium-scale targets. A strategy of fusing multi-scale features with the hybrid attention mechanism is proposed, and a bottom-up downsampling path and top-down upsampling path are constructed by replacing the original backbone network.Specifically, the downsampling path adaptively enhances shallow spatial features and deep semantic features using the self-attention mechanism.In the upsampling path, the semantic information of the deep features is enhanced by fusing the local and global information of the feature maps at three scales, and spatial and coordinate attention mechanisms are introduced to enrich the semantic and position information of the feature maps to be fused, respectively, while a self-attention enhancement module is used to further enhance the expression capability of the fused features. Experimental results show that when the input image size is 512×512 pixels, the mean Average Precision(mAP) of the proposed improved algorithm on the PASCAL VOC and HRRSD data sets were 84.6% and 89.6%, respectively, which increase 6.1 and 8.8 percentage points respectively compared with the SSD algorithm.

Key words: deep learning, attention mechanism, small target detection, feature enhancement, feature fusion

吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.

Shan WU, Feng ZHOU. Small Target Detection Based on Improved SSD Algorithm[J]. Computer Engineering, 2023, 49(7): 179-188.

https://www.ecice06.com/CN/Y2023/V49/I7/179

图/表 18

参考文献 31

1	TONG K, WU Y Q, ZHOU F. Recent advances in small object detection based on deep learning: a review. Image and Vision Computing, 2020, 97(5): 103910.
2	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278- 2324. doi: 10.1109/5.726791
3	GIRSHICK R. Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2016: 1440-1448.
4	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence. Washington D. C., USA: IEEE Press, 2016: 1137-1149.
5	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 779-788.
6	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 6517-6525.
7	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-06-14]. https://arxiv.org/abs/1804.02767.
8	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[M]//LEIBE B, MATAS J, SEBE N, et al. Computer vision-ECCV 2016. Berlin, Germany: Springer International Publishing, 2016: 21-37.
9	FU C, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. 2022-06-14]. https://arxiv.org/abs/1701.06659.
10	LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[EB/OL]. [2022-06-14]. https://arxiv.org/abs/1712.00960.
11	赵文清, 周震东, 翟永杰. 基于反卷积和特征融合的SSD小目标检测算法. 智能系统学报, 2020, 15(2): 310- 316. URL
	ZHAO W Q, ZHOU Z D, ZHAI Y J. SSD small target detection algorithm based on deconvolution and feature fusion. CAAI Transactions on Intelligent Systems, 2020, 15(2): 310- 316. URL
12	高娜, 吴清, 张满囤. 多尺度特征增强的SSD目标检测算法. 河北工业大学学报, 2022, 51(2): 23- 30. URL
	GAO N, WU Q, ZHANG M D. Multi-scale feature enhancement based SSD algorithm. Journal of Hebei University of Technology, 2022, 51(2): 23- 30. URL
13	亢洁, 刘港, 郭国法. 基于多尺度融合模块和特征增强的杂草检测方法. 农业机械学报, 2022, 53(4): 254- 260. URL
	KANG J, LIU G, GUO G F. Weed detection based on multi-scale fusion module and feature enhancement. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53(4): 254- 260. URL
14	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
15	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]//FERRARI V, HEBERT M, SMINCHISESCU C, et al. Computer vision-ECCV 2018. New York, USA: ACM Press, 2018: 3-19.
16	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 11531-11539.
17	毛腾跃, 宋阳, 郑禄. 基于多尺度与混合注意力机制的苹果目标检测. 中南民族大学学报(自然科学版), 2022, 41(2): 235- 242. URL
	MAO T Y, SONG Y, ZHENG L. Apple target detection based on multi-scale and hybrid attention mechanism. Journal of South-Central University for Nationalities (Natural Science Edition), 2022, 41(2): 235- 242. URL
18	赵一鸣, 王金聪, 任洪娥, 等. 融合ReFPN结构与混合注意力的小目标检测算法. 哈尔滨理工大学学报, 2022, 27(2): 85- 91. URL
	ZHAO Y M, WANG J C, REN H E, et al. A small object detection algorithm integrated with ReFPN and compound attention mechanism. Journal of Harbin University of Science and Technology, 2022, 27(2): 85- 91. URL
19	汪睿卿, 王慧琴, 王可. 融合细节特征与混合注意力机制的火灾烟雾检测. 液晶与显示, 2022, 37(7): 900- 912. URL
	WANG R Q, WANG H Q, WANG K. Fire smoke detection combined with detailed features and hybrid attention mechanism. Chinese Journal of Liquid Crystals and Displays, 2022, 37(7): 900- 912. URL
20	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
21	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 936-944.
22	SHI W Z, CABALLERO J, HUSZÁR F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 1874-1883.
23	JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[EB/OL]. [2022-06-14]. https://arxiv.org/abs/1506.02025.
24	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 13708-13717.
25	TAO Y T, XU M Z, ZHANG F, et al. Unsupervised-restricted deconvolutional neural network for very high resolution remote-sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(12): 6805- 6823.
26	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[EB/OL]. [2022-06-14]. https://arxiv.org/abs/1605.06409v2.
27	JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[EB/OL]. [2022-06-14]. https://arxiv.org/abs/1705.09587.
28	CUI L S, MA R, LV P, et al. MDSSD: multi-scale deconvolutional single shot detector for small objects. Science China Information Sciences, 2020, 63(2): 120113.
29	SUN H, SUN X, WANG H Q, et al. Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model. IEEE Geoscience and Remote Sensing Letters, 2012, 9(1): 109- 113.
30	CHENG G, ZHOU P C, HAN J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7405- 7415.
31	LU X Q, ZHANG Y L, YUAN Y, et al. Gated and axis-concentrated localization network for remote sensing object detection. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(1): 179- 192.

算法	参数量/10⁶	浮点运算数/(10⁹帧·s^-1)
SSD^[8]	26.52	30.96
本文算法	58.70	15.49

算法	参数量/10⁶	浮点运算数/(10⁹帧·s^-1)
SSD^[8]	26.52	30.96
本文算法	58.70	15.49

参数名称	取值
Epoch	300
Batch_size	32
Learning_rate	0.004
Momentum	0.9
Weight_decay	0.000 5

参数名称	取值
Epoch	300
Batch_size	32
Learning_rate	0.004
Momentum	0.9
Weight_decay	0.000 5

算法	骨干网络	输入大小/像素	mAP/%	帧率/(帧·s^-1)
Faster R-CNN^[4]	VGG-16	1 000×600	73.2	7.0
YOLOv2^[6]	Darknet19	416×416	76.8	67.0
YOLOv3^[7]	Darknet53	416×416	79.3	39.0
SSD^[8]	VGG-16	300×300	77.2	46.0
SSD^[8]	VGG-16	512×512	78.5	19.0
DSSD^[9]	ResNet101	321×321	78.6	9.5
FSSD^[10]	VGG-16	300×300	78.8	65.8
FSSD^[10]	VGG-16	512×512	80.9	35.7
R-FCN^[26]	ResNet101	1 000×600	79.5	5.8
RSSD^[27]	VGG-16	300×300	78.5	35.0
RSSD^[27]	VGG-16	512×512	80.8	16.6
MDSSD^[28]	VGG-16	300×300	78.6	32.2
MDSSD^[28]	VGG-16	512×512	81.0	14.5
本文算法	Darknet53	300×300	81.3	34.9
本文算法	Darknet53	512×512	84.6	18.5

选择文件类型/文献管理软件名称

选择包含的内容