Athlete Detection Algorithm Based on Multi-scale Linear Global Attention

doi:10.19678/j.issn.1000-3428.0068140

Abstract

Abstract:

The rapid movement and frequent occlusion of athletes during a competition make it difficult to detect athletes in a video, along with causing multiple detections, a decline in the detection accuracy, and other problems. The current mainstream detection methods do not perform well for athlete detection under moving and occluding conditions. When the athletes are occluded, the size of the bounding box increases. In this study, a cutout is introduced as a data augmentation method to simulate occlusion, and an athlete detection algorithm based on a multi-scale linear global attention EfficientViT module is constructed. Specifically, the linear global attention module is used to reduce the amount of computation, and the convolution module is supplemented to enhance its local feature extraction capability. The tokens for different attention heads are aggregated through lightweight small convolution to obtain multi-scale information and enhance its global feature extraction capability. EIoU is selected as the bounding box loss for the loss function, with the width and height distances between the detection bounding box and target bounding box added. Thus, the detection and real target bounding boxes are closer in scale. The results of an experiment on four publicly available basketball game video datasets from the SportsMOT dataset show that the proposed algorithm can achieve a precision of 98.0% and mean Average Precision(mAP) of 98.2%. The precision and high-confidence mAP of the proposed algorithm are 4% and 8.7% higher, respectively, than that of the original YOLOv5 algorithm.

Key words: YOLOv5 algorithm, athlete detection, multi-scale linear global attention, data augmentation, bounding box loss

摘要：

运动员在比赛过程中的快速移动且频繁遮挡, 使得对视频中运动员检测容易出现漏检、多检、检测精度下降等问题。现有的主流方法对于移动和遮挡情况下的运动员检测表现不佳。当运动员受到遮挡后, 检测目标框的尺度变化增大。引入cutout作为数据增强的方法来模拟遮挡情况, 提出基于多尺度线性全局注意力EfficientViT模块的运动员检测算法。使用线性全局注意力模块减少计算量, 并辅以卷积模块来增强其局部的特征提取能力, 通过轻量级小卷积聚合不同注意力头部的token获得多尺度信息, 增强其全局特征提取能力。针对损失函数部分, 选择EIoU作为边界框损失, 加入检测框与目标框的宽高距离, 使得检测框和真实目标框在尺度上更为贴近。在SportsMOT数据集中4个公开的篮球比赛视频数据集上的实验结果表明, 该算法取得了98.0%准确率和98.2%的平均精度均值, 相较于YOLOv5算法, 其精度提升了4%, 高置信度的平均精度均值提升了8.7%。

关键词: YOLOv5算法, 运动员检测, 多尺度线性全局注意力, 数据增强, 边界框损失

Zhiwei LIN, Zuyuan YANG, Siqiu WANG, Chao YANG. Athlete Detection Algorithm Based on Multi-scale Linear Global Attention[J]. Computer Engineering, 2024, 50(7): 352-359.

林芷薇, 杨祖元, 王斯秋, 杨超. 基于多尺度线性全局注意力的运动员检测算法[J]. 计算机工程, 2024, 50(7): 352-359.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068140

https://www.ecice06.com/EN/Y2024/V50/I7/352

Figures/Tables 11

Fig.1 Structure of lightweight multi-scale linear global attention module

Fig.2 Aggregation process for generating multi-scale token

Fig.3 Structure of YOLOv5 algorithm based on multi-scale linear global attention

Fig.4 Example of using cutout as means of data enhancement

Fig.5 Loss maps of different backbone networks

Fig.6 Loss maps of different IoU

Fig.7 Loss maps of whether use cutout

Fig.8 Heat maps of different algorithms

Fig.9 Whether to use the improved method for occlusion detection results

References 27

1	WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]//Proceedings of IEEE International Conference on Image Processing. Washington D. C., USA: IEEE Press, 2017: 3645-3649.
2	ZHANG Y F, SUN P Z, JIANG Y, et al. ByteTrack: multi-object tracking by associating every detection box[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 1-21.
3	CHEN L H, SU C W, HSIAO H A. Player trajectory reconstruction for tactical analysis. Multimedia Tools and Applications, 2018, 77(23): 30475- 30486. doi: 10.1007/s11042-018-6164-5
4	SEMPAU J, WILDERMAN S J, BIELAJEW A F. DPM, a fast, accurate Monte Carlo code optimized for photon and electron radiotherapy treatment planning dose calculations. Physics in Medicine & Biology, 2000, 45(8): 2263- 2291.
5	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: ACM Press, 2014: 580-587.
6	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 779-788.
7	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE Press, 2017: 7263-7271.
8	ZHAO L Q, LI S Y. Object detection algorithm based on improved YOLOv3. Electronics, 2020, 9(3): 537. doi: 10.3390/electronics9030537
9	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE Press, 2017: 2117-2125.
10	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. [2023-06-20]. https://arxiv.org/abs/2004.10934.
11	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
12	吴珊, 周凤. 基于改进SSD算法的小目标检测. 计算机工程, 2023, 49(7): 179-188, 195. URL
	WU S, ZHOU F. Small target detection based on improved SSD algorithm. Computer Engineering, 2023, 49(7): 179-188, 195. URL
13	宋华伟, 屈晓娟, 杨欣, 等. 基于改进YOLOv5的火焰烟雾检测. 计算机工程, 2023, 49(6): 250- 256. URL
	SONG H W, QU X J, YANG X, et al. Flame and smoke detection based on improved YOLOv5. Computer Engineering, 2023, 49(6): 250- 256. URL
14	KATHAROPOULOS A, VYAS A, PAPPAS N, et al. Transformers are RNNs: fast autoregressive transformers with linear attention[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2006.16236v3.
15	COLIN R, NOAM S, ADAM R, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 5485- 5551.
16	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of NIPS'17. Cambridge, USA: MIT Press, 2017: 30-41.
17	DEVRIES T, TAYLOR G W, ASSIRI Y. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1708.04552v2.
18	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IoU loss for accurate bounding box regression. Neurocomputing, 2022, 506, 146- 157. doi: 10.1016/j.neucom.2022.07.042
19	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 4510-4520.
20	CAI H, GAN C, HAN S. EfficientViT: enhanced linear attention for high-resolution low-computation visual recognition[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2205.14756, 2022.
21	MSONDA P, UYMAZ S A, KARAAGAC S S. Spatial pyramid pooling in deep convolutional networks for automatic tuberculosis diagnosis. Traitement Du Signal, 2020, 37(6): 1075- 1084. doi: 10.18280/ts.370620
22	ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transactions on Cybernetics, 2022, 52(8): 8574- 8586. doi: 10.1109/TCYB.2021.3095305
23	CUI Y T, ZENG C K, ZHAO X Y, et al. SportsMOT: a large multi-object tracking dataset in multiple sports scenes[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2304.05170v2.
24	TAN M X, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1905.11946v5.
25	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 6848-6856.
26	LI Y, YUAN G, WEN Y, et al. EfficientFormer: vision transformers at mobilenet speed. Information Processing Systems, 2022, 35, 12934- 12949.
27	YU W H, LUO M, ZHOU P, et al. MetaFormer is actually what you need for vision[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE Press, 2022: 10819-10829.

[1]	Yiwen ZHANG, Manchun CAI, Yonghao CHEN, Yi ZHU, Lifeng YAO. Multi-Scale Deepfake Detection Method with Fusion of Spatial Features [J]. Computer Engineering, 2024, 50(7): 240-250.
[2]	ZHANG Baoxin, YANG Dan, NIE Tiezheng, KOU Yue. Recommendation Method Based on Self-supervised Multi-view Graph Collaborative Filtering [J]. Computer Engineering, 2024, 50(5): 100-110.
[3]	GONG Ajuan, PAN Tianrong. Discussion on Deep-Learning Strategies for Diagnosis of Multiple Diseases in Fundus Diseases [J]. Computer Engineering, 2024, 50(5): 363-372.
[4]	Fengmin AN, Bingbing ZHANG, Wei DONG, Jianxin ZHANG. Data Preprocessing Method for Video Action Recognition Depth Models [J]. Computer Engineering, 2024, 50(2): 281-287.
[5]	Yuyan JIANG, Chengfeng TAO, Ping LI. Deep Subspace Clustering Algorithm with Data Augmentation and Adaptive Self-Paced Learning [J]. Computer Engineering, 2023, 49(8): 96-103, 110.
[6]	Lumeng CHEN, Yanyan CAO, Min HUANG, Xingang XIE. Flame Detection Method Based on Improved YOLOv5 [J]. Computer Engineering, 2023, 49(8): 291-301, 309.
[7]	Junhao LIU, Meilin WANG, Xing XIE, Yexing SONG, Lihua XU. Leather Defect Detection Algorithm Based on Improved YOLOv5 [J]. Computer Engineering, 2023, 49(8): 240-249.
[8]	WANG Yubo, CHEN Lifeng, XU Weixia. Anomaly Detection Method Combining with Multi-Decoder and Two-Stage Channel Selection [J]. Computer Engineering, 2023, 49(3): 37-48.
[9]	Jinsheng CHEN, Wenzhen MA, Shaofeng FANG, Ziming ZOU. Object Recognition of Atmospheric Gravity Wave Based on Foundation Airglow Images [J]. Computer Engineering, 2023, 49(11): 13-23.
[10]	Huiyun ZHANG, Heming HUANG. Diplomatic Temporal Convolutional Network with Multi-Task Learning for Network Public Opinion Analysis [J]. Computer Engineering, 2023, 49(10): 89-96, 104.
[11]	MAO Yuqing, ZHAO Kui. Multi-Task Secure Head-Detection Algorithm Based on Improved YOLOv5 [J]. Computer Engineering, 2022, 48(8): 136-143.
[12]	SUN Wei, CHANG Pengshuai, DAI Liang, ZHANG Xiaorui, CHEN Xuan, DAI Guangzhao. Vehicle Type Recognition Based on Attention Guided Data Augmentation [J]. Computer Engineering, 2022, 48(7): 300-306.
[13]	SHE Zhaoyang, YAN Xin, XU Guangyi, CHEN Wei, DENG Zhongying. Adverse Drug Reaction Detection Combined with Data Augmentation and Semi-supervised Learning [J]. Computer Engineering, 2022, 48(6): 314-320.
[14]	FU Yeqiang, LI Junhui. Data Augmentation Method for AMR-to-Text Generation [J]. Computer Engineering, 2022, 48(5): 91-97.
[15]	LU Tongwei, XU Zixin, MIN Feng. Knowledge Distillation Data Augmentation Based on Generation Adversarial Network [J]. Computer Engineering, 2022, 48(4): 70-80.

Please choose a citation manager

Content to export