面向拥挤行人检测的改进YOLOv7算法

doi:10.19678/j.issn.1000-3428.0067741

摘要/Abstract

摘要：

针对拥挤行人检测场景下检测算法容易产生漏检与误检的问题，提出一种改进的YOLOv7拥挤行人检测算法。在骨干网络中引入BiFormer视觉变换器和改进的高效层聚合网络(RC-ELAN)模块，通过自注意力机制与注意力模块使骨干网络更多聚焦于被遮挡行人的重要特征，有效缓解了目标特征缺失对检测造成的负面影响。采用基于双向特征金字塔网络思想的改进颈部网络，通过转置卷积和改进的Rep-ELAN-W模块使模型可以高效利用中低维特征图中的小目标特征信息，有效提升了模型的小目标行人检测性能。引入高效的完全交并比损失函数，使模型可以进一步收敛至更高精度。在含有大量小目标遮挡行人的WiderPerson数据集上的实验结果表明，与YOLOv7、YOLOv5、YOLOX算法相比，改进的YOLOv7算法的交并比阈值分别取0.5和0.5~0.95时的平均精准度提升了2.5和2.8、9.9和7.1、12.3和10.7个百分点，可较好地应用于拥挤行人检测场景。

关键词: 机器视觉, 拥挤行人检测, 注意力机制, YOLO系列算法, 双向特征金字塔网络

Abstract:

Aiming at the problem that the detection algorithm is prone to omission and false detection in crowded pedestrian detection scenarios, this study proposes an improved YOLOv7 crowded pedestrian detection algorithm. Introducing a BiFormer visual transformer and an improved RepConv and Channel Space Attention Module (CSAM)-based Efficient Layer Aggregation Network (RC-ELAN) module in the backbone network, the self-attention mechanism and the attention module enable the backbone network to focus more on the important features of the occluded pedestrians, effectively mitigating the adverse effects of the missing target features on the detection. The improved neck network based on the idea of a Bidirectional Feature Pyramid Network (BiFPN) is used, and the transposed convolution and improved Rep-ELAN-W module enable the model to efficiently utilize the small-target feature information in the middle and low-dimensional feature maps, effectively improving the small-target pedestrian detection performance of the model. The introduction of an Efficient Complete Intersection-over-Union (E-CIoU) loss function allows the model to further converge to a higher accuracy. Experimental results on the WiderPerson dataset containing a large number of small target-obscuring pedestrians demonstrate that the average accuracies of the improved YOLOv7 algorithm when the IoU thresholds are set to 0.5 and 0.5-0.95 are improved by 2.5 and 2.8, 9.9 and 7.1, and 12.3 and 10.7 percentage points compared with the YOLOv7, YOLOv5, and YOLOX algorithms, respectively, which can be better applied to crowded pedestrian detection scenarios.

Key words: machine vision, crowded pedestrian detection, attention mechanism, YOLO series algorithms, Bi-directional Feature Pyramid Network(BiFPN)

徐芳芯, 樊嵘, 马小陆. 面向拥挤行人检测的改进YOLOv7算法[J]. 计算机工程, 2024, 50(3): 250-258.

Fangxin XU, Rong FAN, Xiaolu MA. Improved YOLOv7 Algorithm for Crowded Pedestrian Detection[J]. Computer Engineering, 2024, 50(3): 250-258.

https://www.ecice06.com/CN/Y2024/V50/I3/250

图/表 14

图1 YOLOv7网络结构

Fig.1 YOLOv7 network structure

图2 BiFormer模块结构

Fig.2 BiFormer module structure

图3 通道空间注意力模块结构

Fig.3 Channel space attention module structure

图4 RC-ELAN模块结构

Fig.4 RC-ELAN module structure

图5 Rep-ELAN-W模块结构

Fig.5 Rep-ELAN-W module structure

图6 改进的骨干网络结构

Fig.6 Improved backbone network structure

图7 BiFPN结构

Fig.7 BiFPN structure

图8 转置卷积原理

Fig.8 Transposed convolution principle

图9 改进的颈部网络结构

Fig.9 Improved neck network structure

图10 改进的YOLOv7网络结构

Fig.10 Improved YOLOv7 network structure

图11 检测效果对比

Fig.11 Comparison of detection effects

参考文献 32

1	XU M M, BAI Y C, QU S S, et al. Semantic part RCNN for real-world pedestrian detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington D. C., USA: IEEE Press, 2019: 45-54.
2	HOU Y, ZHENG L, GOULD S. Multiview detection with feature perspective transformation[C]//Proceedings of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 1-18.
3	李颀, 王娇, 邓耀辉. 基于遮挡感知的行人检测与跟踪算法. 传感器与微系统, 2023, 42(4): 126- 130. URL
	LI Q, WANG J, DENG Y H. Pedestrian detection and tracking algorithm based on occlusion-aware. Transducer and Microsystem Technologies, 2023, 42(4): 126- 130. URL
4	ZHANG S F, WEN L Y, BIAN X, et al. Occlusion-aware R-CNN: detecting pedestrians in a crowd[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 657-674.
5	刘毅, 于畅洋, 李国燕, 等. UAST-RCNN: 遮挡行人的目标检测算法. 电子测量与仪器学报, 2022, 36(12): 168- 175. URL
	LIU Y, YU C Y, LI G Y, et al. UAST-RCNN: object detection algorithm for blocking pedestrians. Journal of Electronic Measurement and Instrumentation, 2022, 36(12): 168- 175. URL
6	XU C, WANG J W, YANG W, et al. RFLA: Gaussian receptive field based label assignment for tiny object detection[C]//Proceedings of the 17th European Conference Computer Vision. Berlin, Germany: Springer, 2022: 526-543.
7	ZHAO Q J, SHENG T, WANG Y T, et al. M2Det: a single-shot object detector based on multi-level feature pyramid network[C]//Proceedings of AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2019: 9259-9266.
8	黄凤琪, 陈明, 冯国富. 基于可变形卷积的改进YOLO目标检测算法. 计算机工程, 2021, 47(10): 269-275, 282. doi: 10.19678/j.issn.1000-3428.0059096
	HUANG F Q, CHEN M, FENG G F. Improved YOLO object detection algorithm based on deformable convolution. Computer Engineering, 2021, 47(10): 269-275, 282. doi: 10.19678/j.issn.1000-3428.0059096
9	樊嵘, 马小陆. 面向拥挤行人检测的改进DETR算法. 计算机工程与应用, 2023, 59(19): 159- 165. URL
	FAN R, MA X L. Improved DETR for crowded pedestrian detection. Computer Engineering and Applications, 2023, 59(19): 159- 165. URL
10	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. [2023-04-11]. https://arxiv.org/abs/2207.02696.
11	ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with bi-level routing attention[EB/OL]. [2023-04-11]. https://arxiv.org/abs/2303.08810.
12	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 10781-10790.
13	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19.
14	CHEN X L, LIAN Q W, CHEN X L, et al. Surface crack detection method for coal rock based on improved YOLOv5. Applied Sciences, 2022, 12(19): 9695. doi: 10.3390/app12199695
15	ZHANG S F, XIE Y L, WAN J, et al. WiderPerson: a diverse dataset for dense pedestrian detection in the wild. IEEE Transactions on Multimedia, 2020, 22(2): 380- 393. doi: 10.1109/TMM.2019.2929005
16	BOCHKOVSKIY A, WANG C Y, LIAO H. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-04-11]. https://arxiv.org/abs/2004.10934.
17	王金鹏, 周佳良, 张跃跃, 等. 基于优选YOLOv7模型的采摘机器人多姿态火龙果检测系统. 农业工程学报, 2023,(8): 276- 283. URL
	WANG J P, ZHOU J L, ZHANG Y Y, et al. A multi-pose dragon fruit detection system for picking robot based on the optimal YOLOv7 model. Transactions of the Chinese Society of Agricultural Engineering, 2023,(8): 276- 283. URL
18	GLENN J, AYUSH E, ALEX S, et al. ultralytics/yolov5[EB/OL]. [2023-04-11]. https://github.com/ultralytics/yolov5.
19	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
20	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2023-04-11]. https://arxiv.org/abs/2010.11929.
21	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 13733-13742.
22	ZOPH B, LE Q V. Neural architecture search with reinforcement learning[EB/OL]. [2023-04-11]. https://arxiv.org/abs/1611.01578.
23	ZHENG Z H, WANG P, REN D W, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transactions on Cybernetics, 2022, 52(8): 8574- 8586. doi: 10.1109/TCYB.2021.3095305
24	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IoU loss for accurate bounding box regression. Neurocomputing, 2022, 506, 146- 157. doi: 10.1016/j.neucom.2022.07.042
25	CHEN K, WANG J, PANG J, et al. MMDetection: open MMLab detection toolbox and benchmark[EB/OL]. [2023-04-11]. https://arxiv.org/abs/1906.07155.
26	KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. [2023-04-11]. https://arxiv.org/abs/1412.6980.
27	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of ECCV'16. Berlin, Germany: Springer, 2016: 21-37.
28	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 2980-2988.
29	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
30	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-04-11]. https://arxiv.org/abs/1804.02767.
31	GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2023-04-11]. https://arxiv.org/abs/2107.08430.
32	NEUBECK A, VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2006: 850-855.

[1]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[2]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[8]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[9]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[10]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[11]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[12]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[13]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.
[14]	王宇, 祁琦, 王纯, 许才. 储能变流器信号高精度故障诊断方法[J]. 计算机工程, 2024, 50(8): 389-396.
[15]	王炼红, 林飞鹏, 李潇瑶, 谌桂枝, 周莉. 融入课程知识图谱的KMAKT预测[J]. 计算机工程, 2024, 50(7): 23-31.

选择文件类型/文献管理软件名称

选择包含的内容