作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 250-258. doi: 10.19678/j.issn.1000-3428.0067741

• 图形图像处理 • 上一篇    下一篇

面向拥挤行人检测的改进YOLOv7算法

徐芳芯1,*(), 樊嵘2, 马小陆2   

  1. 1. 京都情报大学院大学应用信息技术研究科, 日本 京都 606-8225
    2. 安徽工业大学电气与信息工程学院, 安徽 马鞍山 243002
  • 收稿日期:2023-05-31 出版日期:2024-03-15 发布日期:2024-03-20
  • 通讯作者: 徐芳芯
  • 基金资助:
    国家自然科学基金(62172004); 国家自然科学基金(61872004); 安徽省科技重大专项(202003a05020028); 安徽省高等学校自然科学研究重点项目(KJ2019A0065); 芜湖市核心技术攻关科技计划项目(2022hg10)

Improved YOLOv7 Algorithm for Crowded Pedestrian Detection

Fangxin XU1,*(), Rong FAN2, Xiaolu MA2   

  1. 1. Academy of Applied Information Technology, Kyoto College of Graduate Studies for Informatics, Kyoto 606-8225, Japan
    2. School of Electrical and Information Engineering, Anhui University of Technology, Maanshan 243002, Anhui, China
  • Received:2023-05-31 Online:2024-03-15 Published:2024-03-20
  • Contact: Fangxin XU

摘要:

针对拥挤行人检测场景下检测算法容易产生漏检与误检的问题,提出一种改进的YOLOv7拥挤行人检测算法。在骨干网络中引入BiFormer视觉变换器和改进的高效层聚合网络(RC-ELAN)模块,通过自注意力机制与注意力模块使骨干网络更多聚焦于被遮挡行人的重要特征,有效缓解了目标特征缺失对检测造成的负面影响。采用基于双向特征金字塔网络思想的改进颈部网络,通过转置卷积和改进的Rep-ELAN-W模块使模型可以高效利用中低维特征图中的小目标特征信息,有效提升了模型的小目标行人检测性能。引入高效的完全交并比损失函数,使模型可以进一步收敛至更高精度。在含有大量小目标遮挡行人的WiderPerson数据集上的实验结果表明,与YOLOv7、YOLOv5、YOLOX算法相比,改进的YOLOv7算法的交并比阈值分别取0.5和0.5~0.95时的平均精准度提升了2.5和2.8、9.9和7.1、12.3和10.7个百分点,可较好地应用于拥挤行人检测场景。

关键词: 机器视觉, 拥挤行人检测, 注意力机制, YOLO系列算法, 双向特征金字塔网络

Abstract:

Aiming at the problem that the detection algorithm is prone to omission and false detection in crowded pedestrian detection scenarios, this study proposes an improved YOLOv7 crowded pedestrian detection algorithm. Introducing a BiFormer visual transformer and an improved RepConv and Channel Space Attention Module (CSAM)-based Efficient Layer Aggregation Network (RC-ELAN) module in the backbone network, the self-attention mechanism and the attention module enable the backbone network to focus more on the important features of the occluded pedestrians, effectively mitigating the adverse effects of the missing target features on the detection. The improved neck network based on the idea of a Bidirectional Feature Pyramid Network (BiFPN) is used, and the transposed convolution and improved Rep-ELAN-W module enable the model to efficiently utilize the small-target feature information in the middle and low-dimensional feature maps, effectively improving the small-target pedestrian detection performance of the model. The introduction of an Efficient Complete Intersection-over-Union (E-CIoU) loss function allows the model to further converge to a higher accuracy. Experimental results on the WiderPerson dataset containing a large number of small target-obscuring pedestrians demonstrate that the average accuracies of the improved YOLOv7 algorithm when the IoU thresholds are set to 0.5 and 0.5-0.95 are improved by 2.5 and 2.8, 9.9 and 7.1, and 12.3 and 10.7 percentage points compared with the YOLOv7, YOLOv5, and YOLOX algorithms, respectively, which can be better applied to crowded pedestrian detection scenarios.

Key words: machine vision, crowded pedestrian detection, attention mechanism, YOLO series algorithms, Bi-directional Feature Pyramid Network(BiFPN)