基于多尺度线性全局注意力的运动员检测算法

doi:10.19678/j.issn.1000-3428.0068140

摘要/Abstract

摘要：

运动员在比赛过程中的快速移动且频繁遮挡, 使得对视频中运动员检测容易出现漏检、多检、检测精度下降等问题。现有的主流方法对于移动和遮挡情况下的运动员检测表现不佳。当运动员受到遮挡后, 检测目标框的尺度变化增大。引入cutout作为数据增强的方法来模拟遮挡情况, 提出基于多尺度线性全局注意力EfficientViT模块的运动员检测算法。使用线性全局注意力模块减少计算量, 并辅以卷积模块来增强其局部的特征提取能力, 通过轻量级小卷积聚合不同注意力头部的token获得多尺度信息, 增强其全局特征提取能力。针对损失函数部分, 选择EIoU作为边界框损失, 加入检测框与目标框的宽高距离, 使得检测框和真实目标框在尺度上更为贴近。在SportsMOT数据集中4个公开的篮球比赛视频数据集上的实验结果表明, 该算法取得了98.0%准确率和98.2%的平均精度均值, 相较于YOLOv5算法, 其精度提升了4%, 高置信度的平均精度均值提升了8.7%。

关键词: YOLOv5算法, 运动员检测, 多尺度线性全局注意力, 数据增强, 边界框损失

Abstract:

The rapid movement and frequent occlusion of athletes during a competition make it difficult to detect athletes in a video, along with causing multiple detections, a decline in the detection accuracy, and other problems. The current mainstream detection methods do not perform well for athlete detection under moving and occluding conditions. When the athletes are occluded, the size of the bounding box increases. In this study, a cutout is introduced as a data augmentation method to simulate occlusion, and an athlete detection algorithm based on a multi-scale linear global attention EfficientViT module is constructed. Specifically, the linear global attention module is used to reduce the amount of computation, and the convolution module is supplemented to enhance its local feature extraction capability. The tokens for different attention heads are aggregated through lightweight small convolution to obtain multi-scale information and enhance its global feature extraction capability. EIoU is selected as the bounding box loss for the loss function, with the width and height distances between the detection bounding box and target bounding box added. Thus, the detection and real target bounding boxes are closer in scale. The results of an experiment on four publicly available basketball game video datasets from the SportsMOT dataset show that the proposed algorithm can achieve a precision of 98.0% and mean Average Precision(mAP) of 98.2%. The precision and high-confidence mAP of the proposed algorithm are 4% and 8.7% higher, respectively, than that of the original YOLOv5 algorithm.

Key words: YOLOv5 algorithm, athlete detection, multi-scale linear global attention, data augmentation, bounding box loss

林芷薇, 杨祖元, 王斯秋, 杨超. 基于多尺度线性全局注意力的运动员检测算法[J]. 计算机工程, 2024, 50(7): 352-359.

Zhiwei LIN, Zuyuan YANG, Siqiu WANG, Chao YANG. Athlete Detection Algorithm Based on Multi-scale Linear Global Attention[J]. Computer Engineering, 2024, 50(7): 352-359.

https://www.ecice06.com/CN/Y2024/V50/I7/352

图/表 11

图1 轻量级多尺度线性全局注意力模块的结构

Fig.1 Structure of lightweight multi-scale linear global attention module

图2 生成多尺度token的聚合过程

Fig.2 Aggregation process for generating multi-scale token

图3 基于多尺度线性全局注意力的YOLOv5算法结构

Fig.3 Structure of YOLOv5 algorithm based on multi-scale linear global attention

图4 使用cutout作为数据增强手段的示例

Fig.4 Example of using cutout as means of data enhancement

图5 不同主干网络的损失图

Fig.5 Loss maps of different backbone networks

图6 不同IoU的损失图

Fig.6 Loss maps of different IoU

图7 是否使用cutout的损失图

Fig.7 Loss maps of whether use cutout

图8 不同算法的热力图

Fig.8 Heat maps of different algorithms

图9 是否使用改进方法对于遮挡情况的检测结果

Fig.9 Whether to use the improved method for occlusion detection results

参考文献 27

1	WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]//Proceedings of IEEE International Conference on Image Processing. Washington D. C., USA: IEEE Press, 2017: 3645-3649.
2	ZHANG Y F, SUN P Z, JIANG Y, et al. ByteTrack: multi-object tracking by associating every detection box[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 1-21.
3	CHEN L H, SU C W, HSIAO H A. Player trajectory reconstruction for tactical analysis. Multimedia Tools and Applications, 2018, 77(23): 30475- 30486. doi: 10.1007/s11042-018-6164-5
4	SEMPAU J, WILDERMAN S J, BIELAJEW A F. DPM, a fast, accurate Monte Carlo code optimized for photon and electron radiotherapy treatment planning dose calculations. Physics in Medicine & Biology, 2000, 45(8): 2263- 2291.
5	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: ACM Press, 2014: 580-587.
6	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 779-788.
7	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE Press, 2017: 7263-7271.
8	ZHAO L Q, LI S Y. Object detection algorithm based on improved YOLOv3. Electronics, 2020, 9(3): 537. doi: 10.3390/electronics9030537
9	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE Press, 2017: 2117-2125.
10	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. [2023-06-20]. https://arxiv.org/abs/2004.10934.
11	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
12	吴珊, 周凤. 基于改进SSD算法的小目标检测. 计算机工程, 2023, 49(7): 179-188, 195. URL
	WU S, ZHOU F. Small target detection based on improved SSD algorithm. Computer Engineering, 2023, 49(7): 179-188, 195. URL
13	宋华伟, 屈晓娟, 杨欣, 等. 基于改进YOLOv5的火焰烟雾检测. 计算机工程, 2023, 49(6): 250- 256. URL
	SONG H W, QU X J, YANG X, et al. Flame and smoke detection based on improved YOLOv5. Computer Engineering, 2023, 49(6): 250- 256. URL
14	KATHAROPOULOS A, VYAS A, PAPPAS N, et al. Transformers are RNNs: fast autoregressive transformers with linear attention[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2006.16236v3.
15	COLIN R, NOAM S, ADAM R, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 5485- 5551.
16	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of NIPS'17. Cambridge, USA: MIT Press, 2017: 30-41.
17	DEVRIES T, TAYLOR G W, ASSIRI Y. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1708.04552v2.
18	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IoU loss for accurate bounding box regression. Neurocomputing, 2022, 506, 146- 157. doi: 10.1016/j.neucom.2022.07.042
19	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 4510-4520.
20	CAI H, GAN C, HAN S. EfficientViT: enhanced linear attention for high-resolution low-computation visual recognition[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2205.14756, 2022.
21	MSONDA P, UYMAZ S A, KARAAGAC S S. Spatial pyramid pooling in deep convolutional networks for automatic tuberculosis diagnosis. Traitement Du Signal, 2020, 37(6): 1075- 1084. doi: 10.18280/ts.370620
22	ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transactions on Cybernetics, 2022, 52(8): 8574- 8586. doi: 10.1109/TCYB.2021.3095305
23	CUI Y T, ZENG C K, ZHAO X Y, et al. SportsMOT: a large multi-object tracking dataset in multiple sports scenes[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2304.05170v2.
24	TAN M X, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1905.11946v5.
25	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 6848-6856.
26	LI Y, YUAN G, WEN Y, et al. EfficientFormer: vision transformers at mobilenet speed. Information Processing Systems, 2022, 35, 12934- 12949.
27	YU W H, LUO M, ZHOU P, et al. MetaFormer is actually what you need for vision[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE Press, 2022: 10819-10829.

[1]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[2]	张宝鑫, 杨丹, 聂铁铮, 寇月. 基于自监督的多视角图协同过滤推荐方法[J]. 计算机工程, 2024, 50(5): 100-110.
[3]	宫阿娟, 潘天荣. 多病种眼底疾病诊断的深度学习策略讨论[J]. 计算机工程, 2024, 50(5): 363-372.
[4]	侯钰涛, 阿布都克力木·阿布力孜, 史亚庆, 马依拉木·木斯得克, 哈里旦木·阿布都克里木. 面向"一带一路"的低资源语言机器翻译研究[J]. 计算机工程, 2024, 50(4): 332-341.
[5]	安峰民, 张冰冰, 董微, 张建新. 面向视频行为识别深度模型的数据预处理方法[J]. 计算机工程, 2024, 50(2): 281-287.
[6]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[7]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[8]	陈露萌, 曹彦彦, 黄民, 谢鑫钢. 基于改进YOLOv5的火焰检测方法[J]. 计算机工程, 2023, 49(8): 291-301, 309.
[9]	席荣康, 蔡满春, 芦天亮. 基于数据增强与流数据处理的Tor流量分析模型[J]. 计算机工程, 2023, 49(3): 177-184.
[10]	王禹博, 陈利锋, 许卫霞. 结合多解码器与两阶段通道选择的异常检测方法[J]. 计算机工程, 2023, 49(3): 37-48.
[11]	曹健, 陈怡梅, 李海生, 蔡强. 基于深度学习的道路小目标检测综述[J]. 计算机工程, 2023, 49(10): 1-12.
[12]	毛雨晴, 赵奎. 基于改进YOLOv5的多任务安全人头检测算法[J]. 计算机工程, 2022, 48(8): 136-143.
[13]	孙伟, 常鹏帅, 戴亮, 张小瑞, 陈旋, 代广昭. 基于注意力引导数据增强的车型识别[J]. 计算机工程, 2022, 48(7): 300-306.
[14]	佘朝阳, 严馨, 徐广义, 陈玮, 邓忠莹. 融合数据增强与半监督学习的药物不良反应检测[J]. 计算机工程, 2022, 48(6): 314-320.
[15]	曹瑞阳, 郭佑民, 牛满宇. 基于最大最小距离的多中心数据综合增强方法[J]. 计算机工程, 2022, 48(6): 174-181.

选择文件类型/文献管理软件名称

选择包含的内容