作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (7): 352-359. doi: 10.19678/j.issn.1000-3428.0068140

• 开发研究与工程应用 • 上一篇    下一篇

基于多尺度线性全局注意力的运动员检测算法

林芷薇, 杨祖元*(), 王斯秋, 杨超   

  1. 广东工业大学自动化学院广东省物联网信息技术重点实验室, 广东 广州 510006
  • 收稿日期:2023-07-24 出版日期:2024-07-15 发布日期:2023-12-05
  • 通讯作者: 杨祖元
  • 基金资助:
    国家自然科学基金(U1911401); 广东省基础与应用基础研究基金联合基金-面上基金项目(2022A1515010688)

Athlete Detection Algorithm Based on Multi-scale Linear Global Attention

Zhiwei LIN, Zuyuan YANG*(), Siqiu WANG, Chao YANG   

  1. Guangdong Key Laboratory of IoT Information Technology, School of Automation, Guangdong University and Technology, Guangzhou 510006, Guangdong, China
  • Received:2023-07-24 Online:2024-07-15 Published:2023-12-05
  • Contact: Zuyuan YANG

摘要:

运动员在比赛过程中的快速移动且频繁遮挡, 使得对视频中运动员检测容易出现漏检、多检、检测精度下降等问题。现有的主流方法对于移动和遮挡情况下的运动员检测表现不佳。当运动员受到遮挡后, 检测目标框的尺度变化增大。引入cutout作为数据增强的方法来模拟遮挡情况, 提出基于多尺度线性全局注意力EfficientViT模块的运动员检测算法。使用线性全局注意力模块减少计算量, 并辅以卷积模块来增强其局部的特征提取能力, 通过轻量级小卷积聚合不同注意力头部的token获得多尺度信息, 增强其全局特征提取能力。针对损失函数部分, 选择EIoU作为边界框损失, 加入检测框与目标框的宽高距离, 使得检测框和真实目标框在尺度上更为贴近。在SportsMOT数据集中4个公开的篮球比赛视频数据集上的实验结果表明, 该算法取得了98.0%准确率和98.2%的平均精度均值, 相较于YOLOv5算法, 其精度提升了4%, 高置信度的平均精度均值提升了8.7%。

关键词: YOLOv5算法, 运动员检测, 多尺度线性全局注意力, 数据增强, 边界框损失

Abstract:

The rapid movement and frequent occlusion of athletes during a competition make it difficult to detect athletes in a video, along with causing multiple detections, a decline in the detection accuracy, and other problems. The current mainstream detection methods do not perform well for athlete detection under moving and occluding conditions. When the athletes are occluded, the size of the bounding box increases. In this study, a cutout is introduced as a data augmentation method to simulate occlusion, and an athlete detection algorithm based on a multi-scale linear global attention EfficientViT module is constructed. Specifically, the linear global attention module is used to reduce the amount of computation, and the convolution module is supplemented to enhance its local feature extraction capability. The tokens for different attention heads are aggregated through lightweight small convolution to obtain multi-scale information and enhance its global feature extraction capability. EIoU is selected as the bounding box loss for the loss function, with the width and height distances between the detection bounding box and target bounding box added. Thus, the detection and real target bounding boxes are closer in scale. The results of an experiment on four publicly available basketball game video datasets from the SportsMOT dataset show that the proposed algorithm can achieve a precision of 98.0% and mean Average Precision(mAP) of 98.2%. The precision and high-confidence mAP of the proposed algorithm are 4% and 8.7% higher, respectively, than that of the original YOLOv5 algorithm.

Key words: YOLOv5 algorithm, athlete detection, multi-scale linear global attention, data augmentation, bounding box loss