基于YOLOv5增强模型的口罩佩戴检测方法研究

doi:10.19678/j.issn.1000-3428.0061502

摘要/Abstract

摘要： 人脸口罩佩戴检测是公共场所疫情防控中极为重要的措施，智能、高效地检测口罩佩戴情况对实现疫情防控的自动化和数字化具有重要意义。使用卷积类深度神经网络实现端到端的人脸口罩佩戴检测具有可行性，但卷积类神经网络具有结构复杂、参数量和浮点计算量庞大的特点，从而产生较高的计算开销和内存需求，极大地限制了其在资源有限的终端设备上的应用。为了使人脸口罩佩戴监督功能更易获取，并实现多尺度条件下的模型压缩和加速检测，提出一种基于改进YOLOv5的轻量化增强网络模型。设计参数量和计算量更小的GhostBottleneckCSP和ShuffleConv模块并替换原YOLOv5网络中的C3及部分Conv模块，以降低特征通道融合过程中的计算量并增强特征表达能力。实验结果表明，该模型的识别精度达95%以上，模型在精度近乎无损失的前提下，参数量和计算量分别仅为原YOLOv5网络的34.24%和33.54%，且在GPU和CPU上的运行速度分别提升13.64%和28.25%，降低了模型对内存存储及计算能力的要求，更适用于在资源有限的移动端部署。

关键词: 深度学习, 口罩佩戴检测, YOLOv5网络, GhostBottleneckCSP模块, ShuffleConv模块

Abstract: Face mask wearing detection is a very important measure in epidemic prevention and control in public places.Intelligent and efficient detection of mask wearing is of great significance in realizing the automation and digitization of epidemic prevention and control.Convolution deep neural networks are feasible for end-to-end face mask wearing detection, but convolution deep neural networks have a complex structure, large number of parameters, and floating-point calculation, resulting in high computation overhead and memory requirements, which greatly limit their application in terminal devices with limited resources.To facilitate the face mask wearing supervision function and realize model compression for accelerated detection under multi-scale conditions, a lightweight enhanced network model based on improved YOLOv5 is proposed.Design GhostBottleneckCSP and ShuffleConv modules with fewer parameters and less computation replace C3 and some Conv modules in the original YOLOv5 network, to reduce the computation overload in the process of feature channel fusion and enhance the ability of feature expression.The experimental results show that the recognition accuracy of the model is more than 95%.On the premise of almost no loss of accuracy, the number of parameters and calculation load of the model are only 34.24% and 33.54% of the original YOLOv5 network, respectively, while the running speed on GPU and CPU is increased by 13.64% and 28.25% respectively, thereby reducing the requirements of the model for memory storage and computing power, making it more suitable for deployment on mobile terminals with limited resources.

Key words: deep learning, mask wearing detection, YOLOv5 network, GhostBottleneckCSP module, ShuffleConv module

中图分类号:

TP391

彭成, 张乔虹, 唐朝晖, 桂卫华. 基于YOLOv5增强模型的口罩佩戴检测方法研究[J]. 计算机工程, 2022, 48(4): 39-49.

PENG Cheng, ZHANG Qiaohong, TANG Zhaohui, GUI Weihua. Research on Mask Wearing Detection Method Based on YOLOv5 Enhancement Model[J]. Computer Engineering, 2022, 48(4): 39-49.

https://www.ecice06.com/CN/Y2022/V48/I4/39

图/表 17

20230114134201

20230114134204

20230114134208

20230114134213

20230114134216

20230114134220

20230114134223

20230114134227

20230114134230

20230114134233

20230114134237

20230114134241

20230114134244

20230114134251

20230114134255

20230114134259

20230114134303

参考文献

[1] ZOU Z, SHI Z, GUO Y, et al.Object detection in 20 years:a survey[EB/OL].[2021-03-05].https://arxiv.org/pdf/1905.05055.pdf.
[2] LECUN Y, BENGIO Y, HINTON G.Deep learning[J].Nature, 2015, 521(7553):436-444.
[3] SEJNOWSKI T J.The deep learning revolution[M].Cambridge, USA:MIT Press, 2018.
[4] JI Y Z, ZHANG H J, ZHANG Z, et al.CNN-based encoder-decoder networks for salient object detection:a comprehensive review and recent advances[J].Information Sciences, 2021, 546:835-857.
[5] LIU L L, ZHANG H J, XU X F, et al.Collocating clothes with generative adversarial networks cosupervised by categories and attributes:a multidiscriminator framework[J].IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(9):3540-3554.
[6] GAO X J, ZHANG Z, MU T T, et al.Self-attention driven adversarial similarity learning network[J].Pattern Recognition, 2020, 105:107331.
[7] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2021-03-05].https://arxiv.org/abs/1409.1556.
[8] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[9] HUANG G, LIU Z, VAN DER MAATEN L, et al.Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:2261-2269.
[10] IANDOLA F N, HAN S, MOSKEWICZ M W, et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL].[2021-03-05].https://arxiv.org/abs/1602.07360.
[11] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60(6):84-90.
[12] HOWARD A G, ZHU M, CHEN B, et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2021-03-05].https://arxiv.org/pdf/1704.04861.pdf.
[13] ZHANG X Y, ZHOU X Y, LIN M X, et al.ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:6848-6856.
[14] 王军, 冯孙铖, 程勇.深度学习的轻量化神经网络结构研究综述[J].计算机工程, 2021, 47(8):1-13. WANG J, FENG S C, CHENG Y.Survey of research on lightweight neural network structures for deep learning[J].Computer Engineering, 2021, 47(8):1-13.(in Chinese)
[15] 邓黄潇.基于迁移学习与RetinaNet的口罩佩戴检测的方法[J].电子技术与软件工程, 2020(5):209-211. DENG H X.Method of mask wearing detection based on transfer learning and RetinaNet[J].Electronic Technology & Software Engineering, 2020(5):209-211.(in Chinese)
[16] 王艺皓, 丁洪伟, 李波, 等.复杂场景下基于改进YOLOv3的口罩佩戴检测算法[J].计算机工程, 2020, 46(11):12-22. WANG Y H, DING H W, LI B, et al.Mask wearing detection algorithm based on improved YOLOv3 in complex scenes[J].Computer Engineering, 2020, 46(11):12-22.(in Chinese)
[17] 谈世磊, 别雄波, 卢功林, 等.基于YOLOv5网络模型的人员口罩佩戴实时检测[J].激光杂志, 2021, 42(2):147-150. TAN S L, BIE X B, LU G L, et al.Real-time detection for mask-wearing of personnel based on YOLOv5 network model[J].Laser Journal, 2021, 42(2):147-150.(in Chinese)
[18] REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once:unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:779-788.
[19] REDMON J, FARHADI A.YOLO9000:better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6517-6525.
[20] REDMON J, FARHADI A.YOLOv3:an incremental improvement[EB/OL].[2021-03-05].https://arxiv.org/abs/1804.02767.
[21] BOCHKOVSKIY A, WANG C Y, LIAO H Y M.YOLOv4:optimal speed and accuracy of object detection[EB/OL].[2021-03-05].https://arxiv.org/abs/2004.10934.
[22] TAN M X, LE Q V.EfficientNet:rethinking model scaling for convolutional neural networks[EB/OL].[2021-03-05].https://arxiv.org/abs/1905.11946.
[23] WANG C Y, MARK LIAO H Y, WU Y H, et al.CSPNet:a new backbone that can enhance learning capability of CNN[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:1571-1580.
[24] HE K M, ZHANG X Y, REN S Q, et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[25] LIU S, QI L, QIN H F, et al.Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8759-8768.
[26] HAN K, WANG Y H, TIAN Q, et al.GhostNet:more features from cheap operations[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:1577-1586.
[27] REZATOFIGHI H, TSOI N, GWAK J, et al.Generalized intersection over union:a metric and a loss for bounding box regression[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:658-666.
[28] ZHENG Z H, WANG P, LIU W, et al.Distance-IoU loss:faster and better learning for bounding box regression[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7):12993-13000.
[29] YUN S, HAN D, CHUN S, et al.CutMix:regularization strategy to train strong classifiers with localizable features[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:6022-6031.
[30] XIONG R B, YANG Y C, HE D, et al.On layer normalization in the transformer architecture[EB/OL].[2021-03-05].https://arxiv.org/abs/2002.04745.
[31] LOSHCHILOV I, HUTTER F.SGDR:stochastic gradient descent with warm restarts[EB/OL].[2021-03-05].https://arxiv.org/abs/1608.03983.

选择文件类型/文献管理软件名称

选择包含的内容