作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于时序预测与特征提取的多目标跟踪研究

  • 发布日期:2025-07-31

Multi-Object Tracking Method Based on Temporal Prediction and Feature Extraction

  • Published:2025-07-31

摘要: 多目标跟踪在计算机视觉领域面临着诸多挑战,如目标遮挡、相似外观等,这些问题极大地制约了跟踪的准确性与鲁棒性。为有效应对此类难题,提出一种新的多目标跟踪方法TBSTrack。该方法由时序预测、特征提取和分段匹配三个核心模块组成。时序预测模块通过构建时序信息缓存区,并结合自注意力机制计算当前帧的预测结果,强化目标的时空关联,从而精准预测目标位置。特征提取模块针对遮挡目标进行分块处理,运用卷积神经网络提取各分块信息,再依据遮挡状况进行拼合,有效去除干扰,实现目标特征的有效表征。分段匹配模块采用两阶段匹配策略,借助可学习锚点在匹配时恢复遗漏目标,再从背景中挖掘潜在目标,综合两者得到最终跟踪结果,以此更新时序信息。为验证方法性能,在MOT17、DanceTrack和SportsMOT数据集上展开实验,结果显示,在HOTA指标方面,该方法分别达到了63.9%、57.3%、75.6%,在IDF1指标方面,也分别取得了79.6%、56.7%、78.8%的成绩。实验结果表明,该方法显著提升了多目标跟踪的准确性和鲁棒性,尤其在复杂场景下优势明显,为多目标跟踪提供了切实有效的解决方案。

Abstract: Multi-object tracking (MOT) faces numerous challenges in the field of computer vision, such as target occlusion and appearance similarity, which significantly constrain tracking accuracy and robustness. To address these issues effectively, a new multi-object tracking method, TBSTrack, is proposed. The method consists of three core modules: temporal prediction, feature extraction, and stage-wise matching. The temporal prediction module constructs a temporal information buffer and uses a self-attention mechanism to calculate the predicted results for the current frame, enhancing the spatiotemporal association of targets and accurately predicting their positions. The feature extraction module handles occluded targets through segmentation, employing convolutional neural network (CNN) to extract features from each segment, and then merges them based on the occlusion status, effectively eliminating interference and enabling robust target feature representation. The stage-wise matching module adopts a two-stage matching strategy, utilizing learnable anchors to recover missed targets during matching and mining potential targets from the background. The final tracking results are obtained by integrating both, updating the temporal information. To evaluate the method's performance, experiments are conducted on the MOT17, DanceTrack, and SportsMOT datasets. The results show that the method achieves HOTA scores of 63.9%, 57.3%, and 75.6%, and IDF1 scores of 79.6%, 56.7%, and 78.8%, respectively. Experimental results demonstrate that the method significantly improves the accuracy and robustness of multi-object tracking, especially in complex scenarios, providing an effective solution for multi-object tracking.