基于改进YOLOv5的密集行人检测算法

doi:10.19678/j.issn.1000-3428.0068753

摘要/Abstract

摘要：

针对现有的行人检测方法对于密集行人或小目标行人检测精度低的问题, 提出一种基于YOLOv5的综合改进算法模型YOLOv5_Conv-SPD_DAFPN。首先, 针对小目标或密集行人的特征信息易丢失这一问题, 在骨干网络中引入Conv-SPD网络模块替代原有的跨步卷积, 有效缓解特征信息丢失的问题; 其次, 针对非相邻特征图不直接融合从而引起特征融合率较低的问题, 提出新的双层渐进金字塔网络(DAFPN), 提高行人检测的准确性和精度; 最后, 基于EIoU_Loss和CIoU_Loss引入EfficiCIoU_Loss定位损失函数, 以调整和提高帧回归率, 促进网络模型更快收敛。模型在CrowdHuman和WiderPerson行人数据集上相比于原YOLOv5模型, mAP@0.5、mAP@0.5∶0.95分别提升了3.9、5.3百分点和2.1、2.1百分点; 引入EfficiCIoU_Loss后, 模型收敛速度分别提升了11%、33%。这些改进使得基于YOLOv5的密集行人检测在特征信息保留、多尺度融合和损失函数优化等方面都取得了显著进展, 提高了其在实际应用中的性能和效率。

关键词: 密集行人检测, 小目标行人检测, Conv-SPD网络, 双层渐进特征金字塔网络, EfficiCIoU_Loss损失函数

Abstract:

Considering the problem of low accuracy in existing pedestrian detection methods for dense or small target pedestrians, this study proposes a comprehensive improved algorithm model called YOLOv5_Conv-SPD_DAFPN based on You Only Look Once (YOLO) v5, a non-strided Convolution Space-to-Depth (Conv-SPD), and Double Asymptotic Feature Pyramid Network (DAFPN). First, to address the issue of feature information loss for small targets or dense pedestrians, a Conv-SPD network module is introduced into the backbone network, to replace the original skip convolution, thereby effectively mitigating the problem of feature information loss. Second, to solve the problem of low feature fusion rates caused by nonadjacent feature maps not directly merging, this study proposes DAFPN to significantly improve the accuracy and precision of pedestrian detection. Finally, based on Efficient Intersection over Union (EIoU) and Complete-IoU (CIoU) losses, this study introduces the EfficiCIoU_Loss location loss function to adjust and accelerate the frame regression rate, thereby promoting faster convergence of the network model. The algorithm model improved mAP@0.5 and mAP@0.5∶0.95 by 3.9, 5.3 and 2.1, 2.1 percentage points, respectively, compared to the original YOLOv5 model on the CrowdHuman and WiderPerson pedestrian datasets. After introducing EfficiCIoU_Loss, the model convergence speed improved by 11% and 33%, respectively. These innovative improvements have led to significant progress in dense pedestrian detection based on YOLOv5 in terms of feature information retention, multiscale fusion, and loss function optimization, thereby enhancing performance and efficiency in practical applications.

Key words: dense pedestrian detection, small target pedestrian detection, Conv-SPD network, Double Asymptotic Feature Pyramid Network (DAFPN), EfficiCIoU_Loss loss function

胡倩, 皮建勇, 胡伟超, 黄昆, 王娟敏. 基于改进YOLOv5的密集行人检测算法[J]. 计算机工程, 2025, 51(3): 216-228.

HU Qian, PI Jianyong, HU Weichao, HUANG Kun, WANG Juanmin. Dense Pedestrian Detection Algorithm Based on Improved YOLOv5[J]. Computer Engineering, 2025, 51(3): 216-228.

https://www.ecice06.com/CN/Y2025/V51/I3/216

图/表 19

图1 ASFF算法模型

Fig.1 ASFF algorithm model

图2 YOLOv5_Conv-SPD_DAFPN网络结构

Fig.2 YOLOv5_Conv-SPD_DAFPN network architecture

图3 Conv-SPD模块

Fig.3 Conv-SPD module

图4 DAFPN网络结构

Fig.4 DAFPN network architecture

图5 特征图加强操作

Fig.5 Feature map enhancement operation

图6 ASFF2操作

Fig.6 ASFF2 operation

图7 不同模型在CrowdHuman数据集上性能指标变化

Fig.7 Performance metric changes of different models on CrowdHuman dataset

图8 不同模型在WiderPerson数据集上性能指标变化

Fig.8 Performance metric changes of different models on WiderPerson dataset

图9 YOLOv5和YOLOv5_Conv-SPD_DAFPN的可视化效果

Fig.9 Visualization effect of YOLOv5 and YOLOv5_Conv-SPD_DAFPN

图10 YOLOv5的可视化效果1

Fig.10 Visualization effect 1 of YOLOv5

图11 YOLOv5+Conv-SPD的可视化效果

Fig.11 Visualization effect of YOLOv5+Conv-SPD

图12 YOLOv5的可视化效果2

Fig.12 Visualization effect 2 of YOLOv5

图13 YOLOv5+DAFPN的可视化效果

Fig.13 Visualization effect of YOLOv5+DAFPN

图14 EfficiCloU_Loss模型Loss值变化

Fig.14 The change in the loss value of the EfficiCloU_Loss model

参考文献 33

1	赵才荣, 齐鼎, 窦曙光, 等. 智能视频监控关键技术: 行人再识别研究综述. 中国科学(信息科学), 2021, 51(12): 1979- 2015.
	ZHAO C R, QI D, DOU S G, et al. Key technology for intelligent video surveillance: a review of person re-identification. Scientia Sinica (Informationis), 2021, 51(12): 1979- 2015.
2	VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2001: 377-390.
3	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2005: 886-893.
4	ASA B H, HORN D, SIEGELMANN H T, et al. A support vector method for clustering[C]//Proceedings of the 13th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2000.367-373.
5	邓广宏. 基于深度学习的行人检测方法研究[D]. 赣州: 江西理工大学, 2020.
	DENG G H. Research on pedestrian detection method based on deep learning[D]. Ganzhou: Jiangxi University of Science and Technology, 2020. (in Chinese)
6	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278- 2324. doi: 10.1109/5.726791
7	王皓洁, 孙家炜. 基于注意力机制的多尺度实时人脸检测方法. 现代计算机, 2021(15): 42-47, 60.
	WANG H J, SUN J W. Multi-scale real-time face detection method based on attention mechanism. Modern Computer, 2021(15): 42-47, 60.
8	REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press: 2016: 778-779.
9	黄键, 徐伟峰, 苏攀, 等. 基于YOLOX-S的车窗状态识别算法. 吉林大学学报(理学版), 2023, 61(4): 875- 882.
	HUANG J, XU W F, SU P, et al. Car windows state recognition algorithm based on YOLOX-S. Journal of Jilin University (Scicence Edition), 2023, 61(4): 875- 882.
10	BOCHKOVSKIY A, WANG C, LIAO H. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2004-04-23)[2023-04-15]. https://arxiv.org/pdf/2004.10934.pdf.
11	Ultralytics. YOLOv5[EB/OL]. (2020-06-09)[2023-04-15]. https://github.com/ultralytics/YOLOv5.
12	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot MultiBox Detector[C]//Proceedings of 2016 European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37.
13	仇翔, 王国顺, 赵杨杨, 等. 基于YOLOv3和EPnP算法的多盒姿态估计. 计算机测量与控制, 2021, 29(2): 126- 131.
	QIU X, WANG G S, ZHAO Y Y, et al. Multi-box pose estimation based on YOLOv3 and EPnP algorithm. Computer Measurement and Control, 2021, 29(2): 126- 131.
14	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press: 2017: 7263-7271.
15	REDMON J, FARHADI A. YOLOv3: an incremental improvemen[EB/OL]. [2023-04-15]. https://arxiv.org/abs/1804.02767.
16	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-04-15]. https://arxiv.org/abs/2004.10934.
17	LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07)[2023-04-15]. https://arxiv.org/abs/2209.02976.
18	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press: 2023: 7464-7475.
19	Ultralytics. YOLOv8[EB/OL]. (2023-01-10)[2023-04-15]. https://ultralytics.com/yolov8.
20	GIRHSICK R. Fast R-CNN[C]//Proceedings of the IEEE Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2015: 1440-1448.
21	HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386- 397. doi: 10.1109/TPAMI.2018.2844175
22	JOCHER G, STOKEN A, BOROVEC J, et al. Ultralytics/YOLOv5: v3.1-bug fixes and performance improvements[EB/OL]. (2020-10-29)[2023-04-15]. https://zenodo.org/record/4154370.
23	王浩臣, 辛月兰, 盛月, 等. 基于改进YOLOv5x的遥感图像目标检测算法. 激光杂志, 2024, 45(2): 95- 100.
	WANG H C, XING Y L, SHENG Y, et al. Remote sensing image target detection algorithm based on YOLOv5x. Laser Journal, 2024, 45(2): 95- 100.
24	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IoU loss for accurate bounding box regression. Neurocomputing, 2022, 506, 146- 157. doi: 10.1016/j.neucom.2022.07.042
25	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press: 2017: 2117-2125.
26	SHEN C H. Adaptively spatial feature fusion for object detection. Pattern Recognition Letters, 2019(137): 27- 37.
27	ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993- 13000. doi: 10.1609/aaai.v34i07.6999
28	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized Intersection over Union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press: 2019: 658-666.
29	刘竣文, 张永军, 李智, 等. 基于RDM-YOLOv3的头部检测. 激光与光电子学进展, 2022, 59(8): 0815011.
	LIU J W, ZHANG Y J, LI Z, et al. Head detection based on RDM-YOLOv3. Laser & Optoelectronics Progress, 2022, 59(8): 0815011.
30	YANG G, LEI J, ZHU Z, et al. AFPN: asymptotic feature pyramid network for object detection[EB/OL]. [2023-04-15]. https://arxiv.org/abs/2306.15988.
31	LI J F, WEN Y, HE L H. SCConv: spatial and channel reconstruction convolution for feature redundancy[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press: 2023: 6153-6162.
32	ZHU X Z, HU H, LIN S, et al. Deformable ConvNets V2: more deformable, better results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press: 2019: 9308-9316.
33	FANG Y, LIAO B, WANG X, et al. You Only Look at One Sequence: rethinking transformer in vision through object detection[EB/OL]. [2023-04-15]. https://arxiv.org/abs/2106.00666.

选择文件类型/文献管理软件名称

选择包含的内容