MS-ADFF: A multi-scale aggregation-diffusion feature fusion algorithm for pedestrian detection on waste unloading platform

doi:10.19678/j.issn.1000-3428.0253353

Abstract

Abstract: Pedestrian detection on unloading platforms in waste incineration power plants remains challenging due to complex lighting interference and significant variations in pedestrian scales. Existing pedestrian detection methods exhibit limitations in shallow edge feature extraction, multi-scale feature fusion, and lightweight detection head design. To address these issues, this paper proposes a pedestrian detection model named MS-ADFF, which is based on multi-scale aggregation-diffusion feature fusion. Firstly, an edge feature enhancement module is developed. By reinforcing contour information within shallow features, this module effectively mitigates the adverse impact of image detail blurring under complex lighting conditions. Secondly, a multi-scale aggregation-diffusion feature fusion network is constructed, performing two rounds of feature aggregation and diffusion operations on the P3, P4, and P5 feature levels, which effectively integrates multi-scale semantic features through aggregation and diffusion mechanisms, thereby enhancing the model’s capability to perceive pedestrians targets of different scales. Finally, a lightweight shared detection head constructed using deep convolution and group convolution is proposed, which replaces the traditional dual-branch structure with a shared feature extraction mechanism, effectively suppressing redundant parameters while maintaining detection accuracy. Experimental results show that, with YOLOv11s as the baseline model, the proposed MS-ADFF model achieves a detection accuracy of 92.7% on the self-built WIPPID dataset, with Recall and mAP@0.5 improved by 4.6% and 1.5% respectively compared to the baseline model, while reducing 0.7 GFLOPs in floating-point operations. On the public CityPersons dataset, the MS-ADFF model improves detection precision by 1.9% over the baseline model, with a reduction of 0.7 GFLOPs. These results demonstrate that, under the condition of overall floating-point operations being lower than those of the baseline model, the proposed method effectively enhances pedestrian detection accuracy in unloading platforms of waste incineration power plants, while also exhibiting strong generalization ability and robustness in street-scene pedestrian detection tasks.

摘要： 针对垃圾焚烧电站卸料平台场景中存在的复杂光照干扰、行人尺度差异显著等问题，现有行人检测方法在浅层边缘特征提取、多尺度特征融合和检测头轻量化设计等方面存在不足。为此，提出一种融合多尺度聚扩特征的行人检测模型(MS-ADFF)。首先，设计边缘特征增强模块，通过强化浅层特征中行人轮廓特征信息，有效降低复杂光照环境下图像细节模糊对行人目标检测的影响；其次，构建多尺度聚扩融合网络，对P3、P4和P5尺度层特征进行两次特征聚扩操作，通过特征聚合与扩散机制有效融合多尺度语义特征，增强模型对不同尺度行人目标的感知能力；最后，构建由深度卷积和分组卷积构成的轻量化共享检测头，通过共享特征提取机制替代传统双分支结构，在保证检测精度的同时有效抑制参数冗余。实验结果表明，以YOLOv11s为基线模型，在自建数据集WIPPID上MS-ADFF模型达到了92.7%的检测精度，Recall、mAP@0.5分别较基线模型提升了4.6%和1.5%，浮点运算量减少了0.7 GFLOPs；在公开数据集CityPersons上MS-ADFF模型的检测精度较基线模型提升了1.9%，浮点运算量减少了0.7 GFLOPs。证明该模型在整体浮点运算量低于基线模型的条件下，能够有效提升垃圾焚烧电站卸料平台场景下的行人检测精度，同时在街道场景下的行人检测任务中表现出良好的泛化能力和鲁棒性。

LI Hao, MA Zhenzhe, CHENG Lan, XU Xinying. MS-ADFF: A multi-scale aggregation-diffusion feature fusion algorithm for pedestrian detection on waste unloading platform[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253353.

李豪, 马振哲, 程兰, 续欣莹. MS-ADFF: 融合多尺度聚扩特征的垃圾卸料平台行人检测[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253353.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0253353

References

[1] 宋剑华. 垃圾焚烧发电厂安全管理研究[J]. 中国管理信息化, 2024, 27(22): 117-119. SONG J H. Research on safety management in waste-to- energy plants[J]. China Management Informationization, 2024, 27(22): 117-119.Learning what you want to learn using programmable gradient information[C]//European conference on computer vision, 2024: 1-21.
[2] LI Z, DONG Y, SHEN L, et al. Development and challenges of object detection: A survey[J]. Neurocomputing, 2024, 598: 128102.
[3] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.
[4] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[5] ZHOU X, KOLTUN V, KRAHENBUHL P. Probabilistic two-stage detection[J]. ArXiv Preprint, 2021, ArXiv: 2103. 07461.
[6] 卜德森, 苏绍璟, 王迎龙, 等. 复杂背景干扰下基于时空关联的低慢小红外目标检测方法[J]. 仪器仪表学报, 2025, 46(5): 183-194. BU D S, SU S J, WANG Y L, et al. Low-slow small infrared target detection method based on spatio-temporal correlation under complex background interference[J]. Chinese Journal of Scientific Instrument, 2025, 46(5): 183-194.
[7] 李杨, 马社祥. 融合注意力机制的轻量化行人检测算法[J]. 计算机应用与软件, 2025, 42(9): 173-180. LI Y, MA S X. Lightweight pedestrian detection algorithm based on attention mechanism[J]. Computer Applications and Software, 2025, 42(9): 173-180.
[8] LIU W, QIAO X, ZHAO C, et al. VP-YOLO: A human visual perception-inspired robust vehicle-pedestrian detection model for complex traffic scenarios[J]. Expert Systems with Applications, 2025, 274(15): 1-15.
[9] 宋天泽, 曹从军, 何佳琪, 等. 基于改进DETR的密集行人检测算法研究[J/OL]. 计算机工程, 1-11[2025-11-20].https://doi.org/10.19678/j.issn.1000-3428.0070106. SONG T Z, CAO C J, HE J Q, et al. Research on dense pedestrian detection algorithm based on improved DETR[J/OL]. Computer Engineering, 1-11[2025-11-20]. https://doi.org/10.19678/j.issn.1000-3428.0070106.
[10] WEI J, SU S, ZHAO Z, et al. Infrared pedestrian detection using improved UNet and YOLO through sharing visible light domain information[J]. Measurement, 2023, 221: 113442.
[11] 姚聪, 方遒, 郭星浩. 改进YOLOv8的轻量化密集行人检测方法[J]. 计算机工程与应用, 2025, 61(13): 138-150. YAO C, FANG Q, GUO X H. Improved lightweight dense pedestrian detection method based on YOLOv8[J]. Computer Engineering and Applications, 2025, 61(13): 138-150.
[12] 梁天添, 杨淞淇, 钱振明. 基于改进YOLOv8s的恶劣天气车辆行人检测方法[J]. 电子测量技术, 2024, 47(9): 112-119. LIANG T T, YANG S Q, QIAN Z M. Improved YOLOv8s method for vehicle and pedestrian detection in adverse weather[J]. Electronic Measurement Technology, 2024, 47(9): 112-119.
[13] 陈海秀, 陈子昂, 房威志, 等. 复杂场景下的改进YOLOv8-n密集行人检测模型[J/OL]. 计算机工程, 1-12[2025-11-20].https://doi.org/10.19678/j.issn.1000-3428.0070531. CHEN H X, CHEN Z A, FANG W Z, et al. An improved dense pedestrian detection algorithm based on YOLOv8-n in complex scenes[J/OL]. Computer Engineering, 1-12[2025-11-20].https://doi.org/10.19678/j.issn.1000-3428.0070531.
[14] 袁婷婷, 赖惠成, 汤静雯, 等. LMFI-YOLO：复杂场景下的轻量化行人检测算法[J]. 计算机工程与应用, 2025, 61(15): 111-123. YUAN T T, LAI H C, TANG J W, et al. LMFI-YOLO: Lightweight pedestrian detection algorithm in complex scenes[J]. Computer Engineering and Applications, 2025, 61(15): 111-123.
[15] PAN L, DIAO J, WANG Z, et al. HF-YOLO: Advanced pedestrian detection model with feature fusion and imbalance resolution[J]. Neural Processing Letters, 2024, 56(2): 1-20.
[16] 石欣, 卢灏, 秦鹏杰, 等. 一种远距离行人小目标检测方法[J]. 仪器仪表学报, 2022, 43(5): 136-146. SHI X, LU H, QIN P J, et al. A long-distance pedestrian small target detection method[J]. Chinese Journal of Scientific Instrument, 2022, 43(5): 136-146.
[17] FALASCHETTI L, MANONI L, PALMA L, et al. Embedded real-Time vehicle and pedestrian detection using a compressed tiny YOLO v3 architecture[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(12), 19399-19414.
[18] 霍华, 邢晓钰. 复杂场景下的轻量化行人检测算法研究[J/OL].计算机应用与软件, 1-10[2025-11-20]. https://link.cnki.net/urlid/31.1260.tp.20250703.1757.002.HUO H, XING X Y. Research on lightweight pedestrian detection algorithm in complex scenes[J/OL]. Computer Application and Software, 1-10[2025-11-20]. https://link.cnki.net/urlid/31.1260.tp.20250703.1757.002.
[19] LIU X, XU X, XIE J, et al. FDENet: Fusion depth semantics and edge-attention information for multispectral pedestrian detection[J]. IEEE Robotics and Automation Letters, 2024, 9(6): 5441-5448.
[20] YING S, SONG X, WANG H. High-frequency-based multi-spectral attention for domain generalization[J]. Artificial Intelligence Review, 2025, 58(8): 1-20.
[21] XU W, WAN Y. ELA: Efficient local attention for deep convolutional neural networks[J]. ArXiv Preprint, 2024, ArXiv: 2403.01123.
[22] XU S, ZHENG S C, XU W, et al. HCF-Net: Hierarchical context fusion network for infrared small object detection[J]. ArXiv Preprint, 2024, ArXiv: 2403.10778.
[23] HOU Q, ZHANG L, CHENG M M, et al. Strip Pooling: Rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4003- 4012.
[24] WU Y, HE K. Group normalization[C]//Proceedings of the European Conference on Computer Cision, 2018: 3-19.
[25] LIU W, LU H, FU H, et al. Learning to upsample by learning to sample[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 6027- 6037.
[26] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[J]. ArXiv Preprint, 2017, ArXiv: 1704.04861.
[27] ZHANG S, BENENSON R, SCHIELE B. Citypersons: A diverse dataset for pedestrian detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3213-3221.
[28] LI C, LI L, JIANG H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. ArXiv Preprint, 2022, ArXiv:2209.02976.
[29] VARGHESE R, SAMBATH M. YOLOv8: A novel object detection algorithm with enhanced performance and robustness[C]//International conference on advances in data engineering and intelligent computing systems, 2024: 1-6.
[30] WANG C Y, YEH I H, MARK LIAO H Y. YOLOv9: Learning what you want to learn using programmable gradient information[C]//European conference on computer vision, 2024: 1-21.
[31] WANG A, CHEN H, LIU L, et al. YOLOv10: Real-time end-to-end object detection[J]. Advances in Neural Information Processing Systems, 2024, 37: 107984-108011.
[32] KHANAM R, HUSSAIN M. YOLOv11: An overview of the key architectural enhancements[J]. ArXiv Preprint, 2024, ArXiv:2410.17725.
[33] ZHANG D, XU C, CHEN J, et al. YOLO-DC: Integrating deformable convolution and contextual fusion for high-performance object detection[J]. Signal Processing: Image Communication, 2025: 117373.
[34] ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024: 16965-16974.
[35] ZHANG J, CHEN Z, YAN G, et al. Faster and lightweight: An improved YOLOv5 object detector for remote sensing images[J]. Remote Sensing, 2023, 15(20): 4974.
[36] DAI X, CHEN Y, XIAO B, et al. Dynamic head: Unifying object detection heads with attentions[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021: 7373-7382.
[37] TIAN Z, SHEN C, CHEN H, et al. FCOS: Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision, 2019: 9627-9636.
[38] XIE W, MA W, SUN X. An efficient re-parameterization feature pyramid network on YOLOv8 to the detection of steel surface defect[J]. Neurocomputing, 2025, 614: 128775.

Please choose a citation manager

Content to export