Remote Sensing Small Object Detection Network Based on Improved YOLOv5

doi:10.19678/j.issn.1000-3428.0065935

Abstract

Abstract:

In remote sensing imagery, the detection of small objects poses significant challenges due to factors such as complex background, high resolution, and limited effective information. Based on YOLOv5, this study proposes an advanced approach, referred to as YOLOv5-RS, to enhance small object detection in remote sensing images. The presented approach employs a parallel mixed attention module to address issues arising from complex backgrounds and negative samples. This module optimizes the generation of a weighted feature map by substituting fully connected layers with convolutions and eliminating pooling layers. To capture the nuanced characteristics of small targets, the downsampling factor is tailored, and shallow features are incorporated during model training. At the same time, a unique feature extraction module combining convolution and Multi-Head Self-Attention (MHSA) is designed to overcome the limitations of ordinary convolution extraction by jointly representing local and global information, thereby extending the model's receptive field. The EIoU loss function is employed to optimize the regression process for both prediction and detection frames to enhance the localization capacity of small objects. The efficacy of the proposed algorithm is verified via experiments on datasets comprising small target remote sensing images. The results show that compared with YOLOv5s, the proposed algorithm has an average detection accuracy improvement of 1.5 percentage points, coupled with a 20% reduction in parameter count. Particularly, the proposed algorithm's average detection accuracy of small vehicle targets increased by 3.2 percentage points. Comparative evaluations against established methodologies such as EfficientDet, YOLOx, and YOLOv7 underscore the proposed algorithm's capacity to adeptly balance the dual objectives of detection accuracy and real-time performance.

Key words: remote sensing small object detection, improved YOLOv5, parallel mixed attention, global feature fusion, loss function

摘要：

受遥感图像背景复杂、分辨率高、有效信息量少等因素影响，现有目标检测算法在检测小目标过程中存在错检、漏检等问题。提出基于YOLOv5的遥感小目标检测算法YOLOv5-RS。为有效减少图像中复杂背景和负样本的干扰，构建并行混合注意力模块，采用卷积替换全连接层和移除池化层的操作来优化注意力模块生成权重特征图的过程。为获取和传递更丰富且更具判别性的小目标特征，调整下采样倍数并在模型训练过程中增加小目标信息丰富的浅层特征，同时设计卷积与多头自注意力相结合的特征提取模块，通过对局部和全局信息进行联合表征以突破普通卷积提取的局限性，从而获得更大的感受野。采用EIoU损失函数优化预测框与检测框的回归过程，增强小目标的定位能力。在遥感小目标数据集上进行实验以验证该算法的有效性。实验结果表明，与YOLOv5s相比，该算法在参数量减少20%的情况下平均检测精度提升1.5个百分点，其中，小车类目标的平均检测精度提升3.2个百分点；与EfficientDet、YOLOx、YOLOv7相比，该算法能有效兼顾检测精度和实时性。

关键词: 遥感小目标检测, 改进YOLOv5, 并行混合注意力, 全局特征融合, 损失函数

Jiaxin LI, Jin HOU, Boying SHENG, Yuhang ZHOU. Remote Sensing Small Object Detection Network Based on Improved YOLOv5[J]. Computer Engineering, 2023, 49(9): 256-264.

李嘉新, 侯进, 盛博莹, 周宇航. 基于改进YOLOv5的遥感小目标检测网络[J]. 计算机工程, 2023, 49(9): 256-264.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0065935

http://www.ecice06.com/EN/Y2023/V49/I9/256

Figures/Tables 15

Fig.1 Overall architecture of YOLOv5-RS network

Fig.2 Structure of CBAM-P module

Fig.3 Structure of feature pyramid network

Fig.4 Structure of BottleTransformer module

Fig.5 Prediction results for the same target area

Fig.6 Distribution of various objects size on DOTA-v dataset

Fig.7 Visualization results of attention mechanism

Fig.8 Detection effects comparison between YOLOv5-RS and YOLOv5s algorithms

References 33

1	ORFANUS D, DE FREITAS E P, ELIASSEN F. Self-organization as a supporting paradigm for military UAV relay networks. IEEE Communications Letters, 2016, 20(4): 804- 807. doi: 10.1109/LCOMM.2016.2524405
2	LIBRÁN-EMBID F, KLAUS F, TSCHARNTKE T, et al. Unmanned aerial vehicles for biodiversity-friendly agricultural landscapes-a systematic review. Science of the Total Environment, 2020, 732, 139204. doi: 10.1016/j.scitotenv.2020.139204
3	ZHANG K, MING D P, DU S G, et al. Distance weight-graph attention model-based high-resolution remote sensing urban functional zone identification. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60, 1- 18.
4	SHEFFIELD J, WOOD E F, PAN M, et al. Satellite remote sensing for water resources management: potential for supporting sustainable development in data-poor regions. Water Resources Research, 2018, 54(12): 9724- 9758. doi: 10.1029/2017WR022437
5	ZHANG W, CONG M Y, WANG L P. Algorithms for optical weak small targets detection and tracking: review[C]//Proceedings of International Conference on Neural Networks and Signal Processing. Washington D. C., USA: IEEE Press, 2004: 643-647.
6	闫钧华, 张琨, 施天俊, 等. 融合多层级特征的遥感图像地面弱小目标检测. 仪器仪表学报, 2022, 43(3): 221- 229. URL
	YAN J H, ZHANG K, SHI T J, et al. Multi-level feature fusion based dim small ground target detection in remote sensing images. Chinese Journal of Scientific Instrument, 2022, 43(3): 221- 229. URL
7	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 13708-13717.
8	陈欣, 万敏杰, 马超, 等. 采用多尺度特征融合SSD的遥感图像小目标检测. 光学精密工程, 2021, 29(11): 2672- 2682. doi: 10.37188/OPE.20212911.2672
	CHEN X, WAN M J, MA C, et al. Recognition of small targets in remote sensing image using multi-scale feature fusion-based shot multi-box detector. Optics and Precision Engineering, 2021, 29(11): 2672- 2682. doi: 10.37188/OPE.20212911.2672
9	谢星星, 程塨, 姚艳清, 等. 动态特征融合的遥感图像目标检测. 计算机学报, 2022, 45(4): 735- 747. URL
	XIE X X, CHENG G, YAO Y Q, et al. Dynamic feature fusion for object detection in remote sensing images. Chinese Journal of Computers, 2022, 45(4): 735- 747. URL
10	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
11	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 936-944.
12	SUN P, PIAO J C, CUI X. Object detection in urban aerial image based on advanced YOLOv3 algorithm[C]//Proceedings of the 5th International Conference on Mechanical, Control and Computer Engineering. Washington D. C., USA: IEEE Press, 2021: 2191-2196.
13	王道累, 杜文斌, 刘易腾, 等. 基于密集连接与特征增强的遥感图像检测. 计算机工程, 2022, 48(6): 251-256, 262. URL
	WANG D L, DU W B, LIU Y T, et al. Remote sensing images detection based on dense connection and feature enhancement. Computer Engineering, 2022, 48(6): 251-256, 262. URL
14	LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[EB/OL]. [2022-09-05]. https://arxiv.org/pdf/1711.07767.pdf.
15	赫晓慧, 宋定君, 李盼乐, 等. 融合多尺度特征的遥感影像道路提取方法. 计算机工程, 2022, 48(8): 196- 205. URL
	HE X H, SONG D J, LI P L, et al. Remote sensing image road extraction method combined with multi-scale features. Computer Engineering, 2022, 48(8): 196- 205. URL
16	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[EB/OL]. [2022-09-05]. https://arxiv.org/pdf/1802.02611.pdf.
17	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 5998-6010.
18	ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2020: 12993-13000.
19	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing, 2022, 506, 146- 157. doi: 10.1016/j.neucom.2022.07.042
20	JOCHER G. YOLOv5[EB/OL]. [2022-09-05]. https://github.com/ultralytics/yolov5.
21	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[EB/OL]. [2022-09-05]. https://arxiv.org/pdf/1807.06521.pdf.
22	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
23	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
24	SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 16514-16524.
25	XIA G S, BAI X, DING J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 3974-3983.
26	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[EB/OL]. [2022-09-05]. https://link.springer.com/content/pdf/10.1007/978-3-319-10602-1_48.pdf?pdf=core.
27	PADILLA R, NETTO S L, DA SILVA E A B. A survey on performance metrics for object-detection algorithms[C]//Proceedings of International Conference on Systems, Signals and Image Processing. Washington D. C., USA: IEEE Press, 2020: 237-242.
28	SANCHEZ S A, ROMERO H J, MORALES A D. A review: comparison of performance metrics of pretrained models for object detection using the TensorFlow framework. IOP Conference Series: Materials Science and Engineering, 2020, 844, 1- 10.
29	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 618-626.
30	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 10778-10787.
31	ZHENG G, SONGTAO L, FENG W, et al. YOLOx: exceeding YOLO series in 2021[EB/OL]. [2022-09-05]. http://arXivpreprint arXiv:2107.08430, 2021.
32	ZHU X K, LYU S C, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of IEEE/CVF International Conference on Computer Vision Workshops. Washington D. C., USA: IEEE Press, 2021: 2778-2788.
33	WANG C Y, BOCHKOVSKIY A, LIAO H Y M, et al. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 1-10.

[1]	Lumeng CHEN, Yanyan CAO, Min HUANG, Xingang XIE. Flame Detection Method Based on Improved YOLOv5 [J]. Computer Engineering, 2023, 49(8): 291-301, 309.
[2]	BI Ran, WANG Yi, ZHOU Xi. Unknown Intent Detection for Task-Oriented Dialogs Based on Reconstruction Error [J]. Computer Engineering, 2023, 49(2): 54-60.
[3]	QI Linglong, GAO Jianling. Small Object Detection Based on Improved YOLOv7 [J]. Computer Engineering, 2023, 49(1): 41-48.
[4]	LAN Zhengjie, WANG Lie, NIE Xiong. An Expression Recognition Algorithm Based on Term Frequency-Inverse Document Frequency and Hybrid Loss [J]. Computer Engineering, 2023, 49(1): 295-302,310.
[5]	LEI Jie, RAO Wenbi, YANG Yanchao, XIONG Shengwu. Pseudo-Label Object Detection Algorithm Based on Classification Uncertainty [J]. Computer Engineering, 2023, 49(1): 49-56.
[6]	ZHOU Haiyun, XIANG Xuezhi, WANG Xinyao, REN Wenkai. Chained End-to-End Pedestrian Multi-Object Tracking Network with Multi-Feature Fusion [J]. Computer Engineering, 2022, 48(9): 305-313.
[7]	LI Yuyang, SHEN Jiquan, ZHAI Haixia, FENG Weihua. Mask Wearing Detection Algorithm Based on Improved SSD [J]. Computer Engineering, 2022, 48(8): 173-179,186.
[8]	ZHANG Jiajun, TANG Yunqi, YANG Zhixiong. Shoe Type Recognition Algorithm with Adaptive Receptive Field and Multi-Branch Feature [J]. Computer Engineering, 2022, 48(6): 295-303.
[9]	XU Runhao, CHENG Jixiang, LI Zhidan, FU Xiaolong. Face Recognition with Occlusion Based on Cyclic Generative Adversarial Networks [J]. Computer Engineering, 2022, 48(5): 289-296,305.
[10]	GUO Aixin, XIA Yinfeng, WANG Dawei, LU Bin. A Multi-scale Crowd Counting Algorithm with Removing Background Interference [J]. Computer Engineering, 2022, 48(5): 251-257.
[11]	TANG Jiamin, HAN Hua, HUANG Li. Coarse-grained and Fine-grained Features Extraction Based on Unsupervised Learning in Pedestrian Re-identification [J]. Computer Engineering, 2022, 48(4): 269-275,283.
[12]	HOU Ruihuan, YANG Xiwang, WANG Zhichao, GAO Jiaxin. A Real-Time Detection Method for Forestry Pests Based on YOLOv4-TIA [J]. Computer Engineering, 2022, 48(4): 255-261.
[13]	LIU Jianguo, JI Guo, YAN Fuwu, SHEN Jianhong, SUN Yunfei. Stereo Matching Network Based on Disparity Optimization [J]. Computer Engineering, 2022, 48(3): 220-228.
[14]	LI Ke, LI Shaomei, JI Lixin, LIU Shuo. Method of Face Forgery Detection Based on Self-Attention Capsule Network [J]. Computer Engineering, 2022, 48(2): 194-200,206.
[15]	FAN Xinyue, BAO Hong, PAN Weiguo. Image Instance Segmentation Method Based on Class-imbalanced Dataset [J]. Computer Engineering, 2022, 48(12): 224-231.

Please choose a citation manager

Content to export