作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (8): 234-242. doi: 10.19678/j.issn.1000-3428.0058745

• 图形图像处理 • 上一篇    下一篇

基于YOLO的多模态加权融合行人检测算法

施政, 毛力, 孙俊   

  1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122
  • 收稿日期:2020-06-24 修回日期:2020-08-11 发布日期:2020-08-21
  • 作者简介:施政(1997-),男,硕士研究生,主研方向为深度学习;毛力,副教授;孙俊,教授。
  • 基金资助:
    国家自然科学基金(61672263)。

YOLO-Based Multi-Modal Weighted Fusion Pedestrian Detection Algorithm

SHI Zheng, MAO Li, SUN Jun   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Received:2020-06-24 Revised:2020-08-11 Published:2020-08-21

摘要: 在夜间光照不足、目标被遮挡导致信息缺失以及行人目标多尺度的情况下,可见光单模态行人检测算法的检测效果较差。为了提高行人检测器的鲁棒性,基于YOLO提出一种可见光与红外光融合的行人检测算法。使用Darknet53作为特征提取网络,分别提取2个模态的多尺度特征。对传统多模态行人检测算法所使用的concat融合方式进行改进,设计结合注意力机制的模态加权融合层,以加强对融合特征图的模态选择。在此基础上,使用多尺度的融合特征进行行人检测。实验结果表明,模态加权融合较concat融合有较大的精度提升,且该算法在夜间光照不足、目标遮挡和目标多尺度情况下检测效果良好,在KAIST数据集上的检测精度优于HalFusion和Fusion RPN+BDT等算法,检测速度也有较大提升。

关键词: 行人检测, 目标检测, 多模态算法, YOLO网络, 注意力机制

Abstract: The performance of single-modal pedestrian detection algorithms based on visible images is limited in the cases of insufficient light at night, lack of information caused by target occlusion, and multi-scale targets. In order to improve the robustness of pedestrian detectors, a YOLO-based pedestrian detection algorithm that combines visible light and infrared light is proposed. By taking Darknet53 as the feature extraction network, the multi-scale features of visible and infrared modalities are extracted. To improve the concat fusion method used by the existing multi-modal pedestrian detection algorithms, a modal weighted fusion layer combined with an attention mechanism is designed to strengthen the modal selection of the fusion feature map. On this basis, the multi-scale fusion features are used for pedestrian detection. Experimental results show that modal weighted fusion significantly improves the accuracy of concat fusion. The proposed algorithm displays excellent detection performance under the conditions of insufficient light at night, target occlusion and multi-scale targets, providing higher detection accuracy and speed than HalFusion, Fusion RPN+BDT and other algorithms on the KAIST dataset.

Key words: pedestrian detection, target detection, multi-modal algorithm, YOLO network, attention mechanism

中图分类号: