Mask-YOLO: Improved Mask Detection Algorithm Based on YOLOv5n

doi:10.19678/j.issn.1000-3428.0069311

Abstract

Abstract:

As basic personal protection items, masks play an increasingly significant role in public health. Existing mask detection algorithms are limited by low precision in complex scenes. To improve precision and training steadiness, this study proposes an improved mask detection algorithm named Mask-YOLO based on YOLOv5n. Specifically, the Softplus activation function is applied to the feature extraction of convolutional blocks in the backbone network, making the model more efficient in reflecting non-linear data and converge faster during training. Coordinate Attention is added to the deep feature extraction backbone by embedding the position information of an object into the channel dimension, helping the model obtain more target features and channel information without high memory usage. Simultaneously, the Spatial Pyramid Pooling Fast (SPPF) module is replaced with the Receptive Field Block (RFB) module in the deep network, enlarging the receptive field of convolutional blocks by various dilation rates and obtaining rich semantic features of the object. Based on the original PANet multi-scale feature fusion process, weighted BiFPN style is introduced to fuse and exchange object features of different scale both semantically and spatially, to further improve the precision of small object detection. The Distance Intersection over Union (DIoU) regression loss function is used to solve the unsteadiness and leakage detection of the model. Finally, Soft-NMS is employed to further improve detection efficiency by reducing the confidence scores of the overlaps from the prediction bounding boxes. Experimental results show that Mask-YOLO improves mAP@0.95 by 8.58% compared with the baseline YOLOv5n, solving the problems of lower precision during object detection, unsteadiness in bounding box regression, and lower convergence during model training, and achieves high efficiency in mask detection.

Key words: object detection, mask detection, feature fusion, YOLOv5n, Feature Pyramid Network(FPN)

摘要：

口罩作为基础的个人防护物品, 在公共卫生领域发挥着重要作用。针对复杂场景下口罩检测精确度低的问题, 提出一种基于YOLOv5n改进的轻量级口罩检测算法Mask-YOLO, 以提高口罩检测精确度和模型训练的稳定性。在特征提取阶段的卷积模块组中采用Softplus激活函数, 提升模型非线性映射效率, 加快模型的收敛速度; 在主干特征提取深层网络中添加Coordinate Attention, 通过嵌入位置信息得到通道注意力, 使网络获取更大的物体区域信息和通道目标特征, 同时避免较大的内存开销; 在深层网络将快速空间金字塔池化(SPPF)模块替换为接受域模块(RFB), 借助不同的膨胀率来扩大卷积特征采样的感受野, 以获取高层网络中丰富的物体语义信息; 在多尺度特征融合网络PANet结构的基础上, 添加BiFPN跨阶段多尺度特征融合设计, 使得具有不同尺度空间信息和语义信息的目标特征充分融合交互, 进一步提升小目标检测精度; 采用DIoU作为边界框损失函数, 用以解决边界框回归不稳定和目标漏检的问题; 采用Soft-NMS的方法, 通过降低重叠检测框置信度得分的方式, 进一步提升检测效率。实验结果表明, Mask-YOLO与基准模型YOLOv5n相比, 在mAP@0.95综合评价指标上性能提升8.58%, 解决了原始YOLOv5n算法在口罩检测中小目标检测精度低、边界框回归不稳定、模型训练收敛慢等问题, 实现了高效的口罩检测。

关键词: 目标检测, 口罩检测, 特征融合, YOLOv5n, 特征金字塔网络

LI Yi, XU Huiying, ZHU Xinzhong, HUANG Xiao, WANG Shumeng, LI Xiyu. Mask-YOLO: Improved Mask Detection Algorithm Based on YOLOv5n[J]. Computer Engineering, 2025, 51(6): 297-310.

李毅, 徐慧英, 朱信忠, 黄晓, 王舒梦, 李悉钰. 基于YOLOv5n模型改进的口罩检测算法: Mask-YOLO[J]. 计算机工程, 2025, 51(6): 297-310.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069311

https://www.ecice06.com/EN/Y2025/V51/I6/297

Figures/Tables 19

Fig.1 YOLOv5n network structure

Fig.2 Mask-YOLO network structure

Fig.3 Different activation functions

Fig.4 Classification of attention mechanism

Fig.5 SE attention mechanism

Fig.6 CA attention mechanism

Fig.7 Inception network structure

Fig.8 RFB network structure

Fig.9 Types of multi-scale feature map detection

Fig.10 Schematic diagram of IoU calculation process

Fig.11 MOXA dataset

Fig.12 Procedure of Mask-YOLO algorithm

Fig.13 Visualization comparison of two methods

References 45

1	WANG B S , ZHENG J B , PHILIP CHEN C L . A survey on masked facial detection methods and datasets for fighting against COVID-19. IEEE Transactions on Artificial Intelligence, 2021, 3 (3): 323- 343.
2	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2014: 580-587.
3	HE K , ZHANG X , REN S , et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916. doi: 10.1109/TPAMI.2015.2389824
4	GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2015: 1440-1448.
5	REN S , HE K , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
6	HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2017: 2961-2969.
7	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2016: 779-788.
8	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2017: 7263-7271.
9	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-12-05]. https://arxiv.org/abs/1804.02767v1.
10	BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-12-05]. https://arxiv.org/abs/2004.10934v1.
11	ultralytics/yolov5 [EB/OL]. [2023-12-05]. https://github.com/ultralytics/yolov5.
12	LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. [2023-12-05]. https://arxiv.org/abs/2209.02976.
13	WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2023: 7464-7475.
14	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2017: 2980-2988.
15	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[EB/OL]. [2023-12-05]. https://arxiv.org/abs/1512.02325.
16	JIANG X B , GAO T H , ZHU Z C , et al. Real-time face mask detection method based on YOLOv3. Electronics, 2021, 10 (7): 837. doi: 10.3390/electronics10070837
17	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 7132-7141.
18	YU J M , ZHANG W . Face mask wearing detection algorithm based on improved YOLO-v4. Sensors, 2021, 21 (9): 3263. doi: 10.3390/s21093263
19	WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington D.C., USA: IEEE Press, 2020: 390-391.
20	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 8759-8768.
21	ZHANG J , HAN F T , CHUN Y T , et al. A novel detection framework about conditions of wearing face mask for helping control the spread of COVID-19. IEEE Access, 2021, 9, 42975- 42984. doi: 10.1109/ACCESS.2021.3066538
22	杨国亮, 余帅英, 杨浩. 改进YOLOV5s的多尺度融合口罩佩戴检测方法. 计算机工程与应用, 2023, 59 (14): 184- 191.
	YANG G L , YU S Y , YANG H . Multi-scale fusion mask wearing detection method based on improved YOLOV5s. Computer Engineering and Applications, 2023, 59 (14): 184- 191.
23	黄家興, 南新元, 张文龙, 等. 基于改进YOLOv5的轻量化口罩检测算法研究. 计算机仿真, 2023, 40 (5): 541- 547.
	HUANG J X , NAN X Y , ZHANG W L , et al. Research on lightweight mask detection algorithm based on improved YOLOv5. Computer Simulation, 2023, 40 (5): 541- 547.
24	TAN M, LE Q. EfficientNetV2: smaller models and faster training[EB/OL]. [2023-12-05]. https://arxiv.org/abs/2104.00298.
25	RAZAVI M , ALIKHANI H , JANFAZA V , et al. An automatic system to monitor the physical distance and face mask wearing of construction workers in COVID-19 pandemic. SN Computer Science, 2021, 3 (1): 27.
26	DEY S K, HOWLADER A, DEB C. MobileNet mask: a multi-phase face mask detection model to prevent person-to-person transmission of SARS-CoV-2[EB/OL]. [2023-12-05]. https://link.springer.com/chapter/10.1007/978-981-33-4673-4_49.
27	SANDLER M, HOWARD A G, ZHU M L, et al. Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation[EB/OL]. [2023-12-05]. https://arxiv.org/abs/1801.04381.
28	李梦茹, 肖秦琨, 韩泽佳. 基于改进YOLOv5的人脸口罩佩戴检测. 计算机工程与设计, 2023, 44 (9): 2811- 2821.
	LI M R , XIAO Q K , HAN Z J . Face mask wearing detection based on improved YOLOv5 algorithm. Computer Engineering and Design, 2023, 44 (9): 2811- 2821.
29	段高峰, 单剑锋, 刘哲. 复杂环境下轻量化口罩佩戴检测算法研究. 电子技术应用, 2023, 49 (8): 108- 113.
	DUAN G F , SHAN J F , LIU Z . Research on lightweight detection algorithm of wearing mask in complex environment. Application of Electronic Technique, 2023, 49 (8): 108- 113.
30	春雨童, 韩飞腾, 何明珂. 新冠肺炎疫情背景下聚集性传染风险智能监测模型. 计算机工程, 2022, 48 (8): 45-52, 61. URL
	CHUN Y T , HAN F T , HE M K . Intelligent monitoring model for aggregated infection risk against the background of COVID-19 epidemic. Computer Engineering, 2022, 48 (8): 45-52, 61. URL
31	HOU Q B, ZHOU D Q, FENG J S. Coordinate Attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2021: 13713-13722.
32	LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[EB/OL]. [2023-12-05]. https://arxiv.org/abs/1711.07767.
33	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2015: 1-9.
34	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 10781-10790.
35	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2017: 2117-2125.
36	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 8759-8768.
37	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2019: 658-666.
38	ZHENG Z H , WANG P , LIU W , et al. Distance-IoU Loss: faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 12993- 13000. URL
39	BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS—improving object detection with one line of code[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2017: 5561-5569.
40	ROY B , NANDY S , GHOSH D , et al. MOXA: a deep learning based unmanned approach for real-time monitoring of people wearing medical masks. Transactions of the Indian National Academy of Engineering, 2020, 5 (3): 509- 518. URL
41	ZHENG Q H , TIAN X Y , YU Z G , et al. DL-PR: generalized automatic modulation classification method based on deep learning with priori regularization. Engineering Applications of Artificial Intelligence, 2023, 122, 106082. URL
42	GEVORGYAN Z. SIoU Loss: more powerful learning for bounding box regression[EB/OL]. [2023-12-05]. https://arxiv.org/abs/2205.12740.
43	LI Y X, HOU Q B, ZHENG Z H, et al. Large selective kernel network for remote sensing object detection[EB/OL]. [2023-12-05]. https://arxiv.org/abs/2303.09030v2.
44	QI Y L, HE Y T, QI X M, et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2023: 6070-6079.
45	OUYANG D L, HE S, ZHANG G Z, et al. Efficient multi-scale attention module with cross-spatial learning[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 2023: 1-5.

[1]	HUANG Qiqiang, AN Guocheng, XIONG Gang. Open-Set Traffic Object Detection Algorithm Based on Vision-Language Pre-training Model [J]. Computer Engineering, 2025, 51(6): 375-384.
[2]	CAO Bei, ZHAO Kui. Dual Emotion and Multi-feature Fusion Based Fake News Detection [J]. Computer Engineering, 2025, 51(6): 193-203.
[3]	ZHENG Cheng, LI Pengfei. Text Classification Based on Feature Fusion of Dual Hypergraph Neural Networks [J]. Computer Engineering, 2025, 51(6): 127-135.
[4]	XI Qi, WANG Mingjie, WEI Jinghe, ZHAO Wei. Small Object Detection Algorithm for Aerial Photography Based on Improved YOLOv3 [J]. Computer Engineering, 2025, 51(6): 184-192.
[5]	ZHAO Xiaohu, XIE Lixun, MU Dengcong, ZHANG Yue. Metal Surface Defect Detection Method Based on TCM-YOLO Network [J]. Computer Engineering, 2025, 51(6): 338-348.
[6]	FENG Xiaofei, XIE Cheng, ZHANG Xiuzhen, DONG Shikui, CHEN Junsheng, YE Shu, ZHONG Xian. Detection Method of Precast Beam Process Based on Dynamic-Static Fusion Mutual Learning [J]. Computer Engineering, 2025, 51(6): 385-394.
[7]	LIU Kai, REN Hongyi, LI Ying, JI Yi, LIU Chunping. Medical Visual Question Answering Based on Cross-Modal Attention Feature Enhancement [J]. Computer Engineering, 2025, 51(6): 49-56.
[8]	LI Baiya. CNN-Transformer-Based Lesion and Organ Segmentation Network for Electronic Laryngoscope [J]. Computer Engineering, 2025, 51(6): 327-337.
[9]	XU Huajie, ZHENG Liwen, ZHANG Pin, QIN Yuanzhuo. Lightweight Concrete Crack Detection Method Based on Multi-Dimensional Attention Module [J]. Computer Engineering, 2025, 51(5): 351-360.
[10]	HUANG Kun, QI Zhaojian, WANG Juanmin, HU Qian, HU Weichao, PI Jianyong. Aggregation Pedestrian Detection Model Based on Improved YOLOv8 [J]. Computer Engineering, 2025, 51(5): 133-142.
[11]	WANG Xiaolong, JIANG Bo, LUO Runshu, AN Guocheng. Congestion Detection Algorithm of Highway Toll Station Based on Multi-Information Fusion [J]. Computer Engineering, 2025, 51(5): 377-386.
[12]	DU Chenyang, ZHANG Xueying, HUANG Lixia, LI Juan. Multi-Feature Speech Emotion Recognition Based on Improved Efficient Channel Attention Mechanism [J]. Computer Engineering, 2025, 51(4): 97-106.
[13]	XU Yonggang, SUN Qixuan, LI Fanjia, CHENG Jianwei, DAI Jiajun. Skeleton Behavior Recognition Based on Extended Temporal and Spatiotemporal Feature Fusion Graph Convolutional Network [J]. Computer Engineering, 2025, 51(4): 281-292.
[14]	LI Shuwei, HUANG Zhengxiang, HU Yun, LIU Xing, LU Xiao, GUO Chang, WU Chengzhong, WANG Yaonan. Low-Light Salient Object Detection Based on Source-Free Domain Adaptation [J]. Computer Engineering, 2025, 51(4): 75-84.
[15]	WANG Zeyu, XU Huiying, ZHU Xinzhong, HUANG Xiao, LIANG Jiajie, LI Chen. Lightweight Fry Detection Algorithm Based on Improved YOLOv8: FD-YOLO [J]. Computer Engineering, 2025, 51(4): 327-338.

Please choose a citation manager

Content to export