Small Object Detection Algorithm for Aerial Photography Based on Improved YOLOv3

doi:10.19678/j.issn.1000-3428.0068698

Abstract

Abstract:

This study presents an improved You Only Look Once version 3 (YOLOv3) algorithm for small object detection, to address problems such as low detection precision for small objects, missed detection, and false detection in the detection process. First, in terms of network structure, the feature extraction capability of the backbone network is improved by using DenseNet-121, with a Densely Connected Network (DenseNet), to replace the original Darknet-53 network as its basic network. Simultaneously, the convolution kernel size is modified to further reduce the loss of feature map information, to enhance the robustness of the detection model against small objects. A fourth feature detection layer with a size of 104×104 pixel is added. Second, the bilinear interpolation method is used to replace the original nearest neighbor interpolation method for upsampling operations, to solve the serious feature loss problem in most detection algorithms. Finally, in terms of the loss function, Generalized Intersection over Union (GIoU) is used instead of Intersection over Union (IoU) to calculate the loss value of the boundary frame, and the Focal Loss function is introduced as the confidence loss function of the boundary frame. Experimental results show that the mean Average Precision (mAP) of the improved algorithm on the VisDrone2019 dataset is 63.3%, which is 13.2 percentage points higher than that of the original YOLOv3 detection model, and 52 frame/s on a GTX 1080 Ti device. The improved algorithm has good detection performance for small objects.

Key words: small object detection, You Only Look Once version 3 (YOLOv3), Densely Connected Network (DenseNet), loss function, Generalized Intersection over Union (GIoU)

摘要：

针对小尺度目标在检测时精确率低且易出现漏检和误检等问题, 提出一种改进的YOLOv3 (You Only Look Once version 3)小目标检测算法。在网络结构方面, 为提高基础网络的特征提取能力, 使用DenseNet-121密集连接网络替换原Darknet-53网络作为其基础网络, 同时修改卷积核尺寸, 进一步降低特征图信息的损耗, 并且为增强检测模型对小尺度目标的鲁棒性, 额外增加第4个尺寸为104×104像素的特征检测层; 在对特征图融合操作方面, 使用双线性插值法进行上采样操作代替原最近邻插值法上采样操作, 解决大部分检测算法中存在的特征严重损失问题; 在损失函数方面, 使用广义交并比(GIoU)代替交并比(IoU)来计算边界框的损失值, 同时引入Focal Loss焦点损失函数作为边界框的置信度损失函数。实验结果表明, 改进算法在VisDrone2019数据集上的均值平均精度(mAP)为63.3%, 较原始YOLOv3检测模型提高了13.2百分点, 并且在GTX 1080 Ti设备上可实现52帧/s的检测速度, 对小目标有着较好的检测性能。

关键词: 小目标检测, YOLOv3, 密集连接网络, 损失函数, 广义交并比

XI Qi, WANG Mingjie, WEI Jinghe, ZHAO Wei. Small Object Detection Algorithm for Aerial Photography Based on Improved YOLOv3[J]. Computer Engineering, 2025, 51(6): 184-192.

奚琦, 王明杰, 魏敬和, 赵伟. 基于改进YOLOv3的航拍小目标检测算法[J]. 计算机工程, 2025, 51(6): 184-192.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068698

https://www.ecice06.com/EN/Y2025/V51/I6/184

Figures/Tables 9

Fig.1 Structure of YOLOv3 model

Fig.2 Structure of DenseNet

Fig.3 Structure of improved backbone network

Fig.4 Structure of improved YOLOv3 model

Fig.5 Schematic diagram of bilinear interpolation method

Fig.6 Schematic diagram of GIoU calculation

Fig.7 Detection results comparison among different algorithms

References 25

1	陈云. 基于深度学习的医学影像检测算法[D]. 北京: 北京邮电大学, 2019.
	CHEN Y. Medical image detection algorithm based on deep learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2019. (in Chinese)
2	杨毅, 桑庆兵. 多尺度特征自适应融合的轻量化织物瑕疵检测. 计算机工程, 2022, 48 (12): 288- 295. doi: 10.19678/j.issn.1000-3428.0063507
	YANG Y , SANG Q B . Lightweight-fabric defect detection based on adaptive fusion of multiscale features. Computer Engineering, 2022, 48 (12): 288- 295. doi: 10.19678/j.issn.1000-3428.0063507
3	刘浩翰, 孙铖, 贺怀清, 等. 金属表面缺陷检测方法YOLOv3I. 吉林大学学报(理学版), 2023, 51 (3): 450- 458.
	LIU H H , SUN C , HE H Q , et al. Metal surface defect detection method YOLOv3I. Journal of Jilin University (Science Edition), 2023, 51 (3): 450- 458.
4	赵亚男, 吴黎明, 陈琦. 基于多尺度融合SSD的小目标检测算法. 计算机工程, 2020, 46 (1): 247- 254. doi: 10.19678/j.issn.1000-3428.0053233
	ZHAO Y N , WU L M , CHEN Q . Small object detection algorithm based on multi-scale fusion SSD. Computer Engineering, 2020, 46 (1): 247- 254. doi: 10.19678/j.issn.1000-3428.0053233
5	彭成, 张乔虹, 唐朝晖, 等. 基于YOLOv5增强模型的口罩佩戴检测方法研究. 计算机工程, 2022, 48 (4): 39- 49. doi: 10.19678/j.issn.1000-3428.0061502
	PENG C , ZHANG Q H , TANG Z H , et al. Research on mask wearing detection method based on YOLOv5 enhancement model. Computer Engineering, 2022, 48 (4): 39- 49. doi: 10.19678/j.issn.1000-3428.0061502
6	赵媛媛, 朱军, 谢亚坤, 等. 改进Yolo-v3的视频图像火焰实时检测算法. 武汉大学学报(信息科学版), 2021, 46 (3): 326- 334.
	ZHAO Y Y , ZHU J , XIE Y K , et al. A real-time video flame detection algorithm based on improved Yolo-v3. Geomatics and Information Science of Wuhan University, 2021, 46 (3): 326- 334.
7	GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2015: 1440-1448.
8	REN S Q , HE K M , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
9	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37.
10	ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 4203-4212.
11	REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 779-788.
12	LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector[EB/OL]. [2023-06-14]. https://arxiv.org/abs/1712.00960.
13	赵文清, 周震东, 翟永杰. 基于反卷积和特征融合的SSD小目标检测算法. 智能系统学报, 2020, 15 (2): 310- 316.
	ZHAO W Q , ZHOU Z D , ZHAI Y J . SSD small target detection algorithm based on deconvolution and feature fusion. CAAI Transactions on Intelligent Systems, 2020, 15 (2): 310- 316.
14	高娜, 吴清, 张满囤. 多尺度特征增强的SSD目标检测算法. 河北工业大学学报, 2022, 51 (2): 23- 30.
	GAO N , WU Q , ZHANG M T . Multi-scale feature enhancement based SSD algorithm. Journal of Hebei University of Technology, 2022, 51 (2): 23- 30.
15	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-06-14]. https://arxiv.org/abs/1804.02767v1.
16	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2261-2269.
17	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 658-666.
18	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 2980-2988.
19	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 6517-6525.
20	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
21	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 936-944.
22	JSHENOY A. Binary cross entropy loss—a brother to cross entropy[EB/OL]. [2023-06-14]. https://medium.com/@arpita.jshenoy/binary-cross-entropy-loss-a-brother-to-cross-entropy-21612b8165b0.
23	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2015: 1-7.
24	EVERINGHAM M , GOOL L , WILLIAMS C K , et al. The pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 2010, 88 (2): 303- 338. doi: 10.1007/s11263-009-0275-4
25	RUSSAKOVSKY O , DENG J , SU H , et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115 (3): 211- 252. doi: 10.1007/s11263-015-0816-y

[1]	ZHAO Xiaohu, XIE Lixun, MU Dengcong, ZHANG Yue. Metal Surface Defect Detection Method Based on TCM-YOLO Network [J]. Computer Engineering, 2025, 51(6): 338-348.
[2]	HUANG Kun, QI Zhaojian, WANG Juanmin, HU Qian, HU Weichao, PI Jianyong. Aggregation Pedestrian Detection Model Based on Improved YOLOv8 [J]. Computer Engineering, 2025, 51(5): 133-142.
[3]	SUN Ting, YANG Jie, LI Jiaxuan, WANG Yaozong. Optimization of YOLOv7 Road Sign Detection Algorithm for Low-Light Traffic Scenes [J]. Computer Engineering, 2025, 51(3): 342-351.
[4]	WANG Xinliang, WANG Luying. Safety Helmet Detection Algorithm with Feature Enhancement in Low Light Blasting Scenes [J]. Computer Engineering, 2025, 51(3): 252-260.
[5]	HU Qian, PI Jianyong, HU Weichao, HUANG Kun, WANG Juanmin. Dense Pedestrian Detection Algorithm Based on Improved YOLOv5 [J]. Computer Engineering, 2025, 51(3): 216-228.
[6]	HU Chaoju, GUO Fengyi. MODF Port State Detection Algorithm Based on Improved YOLOv7 [J]. Computer Engineering, 2025, 51(2): 78-85.
[7]	SUN Haomiao, LI Zongmin, XIAO Qian, SUN Wenjie, ZHANG Wenxin. AI-Curling: An On-Site Curling Analysis and Decision-Making Method [J]. Computer Engineering, 2025, 51(2): 102-110.
[8]	LONG Liye, JIAO Shichao, GUO Lei, HAN Xie, KUANG Liqun. Multimodal 3D Model Retrieval Based on Compact Center Loss [J]. Computer Engineering, 2025, 51(2): 322-334.
[9]	ZHAO Nannan, GAO Feichen. Improved YOLOv8-based Algorithm for Instance Segmentation in Traffic Scenes [J]. Computer Engineering, 2025, 51(1): 198-207.
[10]	DANG Xiaochao, LIU Jian, DONG Xiaohui, ZHU Zhongyan, LI Fenfang. Named Entity Recognition of Mechanical Equipment Failure for Imbalanced Data [J]. Computer Engineering, 2024, 50(9): 104-112.
[11]	Shuang GAO, Yilun SHI, Qiaozhi XU, Lei YU. Research on Cardiac MRI Segmentation Based on Asymmetric Encoding and Decoding Structure of Contrastive Learning [J]. Computer Engineering, 2024, 50(8): 290-300.
[12]	Naiwei TU, Meng JIAO, Xin YAN. Bird's Nest Target Image Detection Model for Transmission Lines in Complex Environments [J]. Computer Engineering, 2024, 50(7): 216-226.
[13]	DU Tiantian, WANG Xiaolong, HE Jing. Optical-flow-based Waterway Velocity Detection Algorithm Under Complex Illumination Conditions [J]. Computer Engineering, 2024, 50(4): 60-67.
[14]	MA Mingxu, MA Hong, SONG Huawei. Pose Estimation Algorithm for Small Target Pedestrians in Urban Street View Based on YOLO-Pose [J]. Computer Engineering, 2024, 50(4): 177-186.
[15]	ZHANG Xu, CHEN Cifa, DONG Fangmin. PCB Defect Detection Algorithm Based on Improved YOLOv7 [J]. Computer Engineering, 2024, 50(12): 318-328.

Please choose a citation manager

Content to export