作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 265-274. doi: 10.19678/j.issn.1000-3428.0065701

• 开发研究与工程应用 • 上一篇    下一篇

基于改进YOLOv5的口罩佩戴检测算法

张欣怡, 张飞, 郝斌, 高鹭, 任晓颖   

  1. 内蒙古科技大学 信息工程学院, 内蒙古 包头 014000
  • 收稿日期:2022-09-07 出版日期:2023-08-15 发布日期:2022-12-07
  • 作者简介:

    张欣怡(1998—),女,硕士研究生,主研方向为图像处理

    张飞,副教授、博士

    郝斌,讲师、博士

    高鹭,副教授、硕士

    任晓颖,讲师

  • 基金资助:
    内蒙古自治区科技计划项目(2021GG0046); 内蒙古自治区科技计划项目(2021GG0048)

Mask Wearing Detection Algorithm Based on Improved YOLOv5

Xinyi ZHANG, Fei ZHANG, Bin HAO, Lu GAO, Xiaoying REN   

  1. School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014000, Inner Mongolia, China
  • Received:2022-09-07 Online:2023-08-15 Published:2022-12-07

摘要:

在公共场合密集人群场景下,由于目标遮挡导致的信息缺失及检测目标较小、分辨率低问题,使得人脸佩戴口罩检测算法的检测效果较差。为提高模型的检测精度和速度,减少硬件占用资源,提出一种基于改进YOLOv5s的口罩佩戴检测算法。将标准卷积和深度可分离卷积相结合替换传统卷积,并进行通道混洗的鬼影混洗卷积,以在保证精度的前提下提升网络速度。将最近邻法上采样替换为轻量级通用上采样算子,充分利用特征语义信息,在改进的YOLOv5s模型Neck层末端添加自适应空间特征融合,可以对不同尺度的特征进行更好的融合,提高网络检测精度,并通过自适应图片采样,缓解数据不均衡的问题,运用马赛克数据增强对小目标进行充分利用。实验结果表明,该算法在AIZOO数据集上的mAP值达到了93%,比YOLOv5原始模型提升了2个百分点,对于佩戴口罩的人脸检测精度达到了97.7%,优于同等情况下YOLO系列、SSD、RetinaFace的检测效果,同时在GPU上的运行推理速度提升了16.7个百分点,且模型权重文件的内存仅为23.5 MB,适用于实时口罩佩戴检测。

关键词: 口罩佩戴检测, YOLOv5s模型, 鬼影混洗卷积, 自适应空间特征融合, 轻量级通用上采样算子

Abstract:

In dense crowd scenes in public places, face mask wearing detection algorithms have poor detection results because of missing information caused by target occlusion and the problems of small detection targets and low resolution. To improve the detection accuracy and speed of the model as well as to reduce the hardware footprint, an improved mask wearing detection algorithm based on YOLOv5s is proposed. The conventional convolution is replaced with Ghost-Shadowed wash Convolution(GSConv), combining Standard Convolution(SConv)and Depth-Wise separable Convolution(DWConv) with channel blending, thereby improving the network speed with guaranteed accuracy. The nearest neighbor upsampling method is replaced with a lightweight universal upsampling operator to make full use of the semantic feature information. Adaptive Spatial Feature Fusion(ASFF) is added at the end of the neck layer of the improved YOLOv5s model, which allows better fusion of features at different scales and improves the network detection accuracy.In addition, adaptive image sampling is used to alleviate the problem of data imbalance. Mosaic data enhancement is used to make full use of small targets.Experimental results show that the model achieves a mean Average Precision(mAP) value of 93% on the AIZOO dataset, a 2 percentage points improvement over the original YOLOv5 model.It achieves 97.7% detection accuracy for faces wearing masks and outperforms the detection results of the YOLO series, SSD, and RetinaFace in the same situation. It also runs on a GPU with a 16.7 percentage points inference speedup. The model weights file uses 23.5 MB memory for real-time mask wearing detection.

Key words: mask wearing detection, YOLOv5s model, Ghost-Shadowed wash Convolution(GSConv), Adaptive Spatial Feature Fusion(ASFF), lightweight universal upsampling operator