基于时序预测与特征提取的多目标跟踪研究

doi:10.19678/j.issn.1000-3428.0252164

摘要/Abstract

摘要： 多目标跟踪在计算机视觉领域面临着诸多挑战，如目标遮挡、相似外观等，这些问题极大地制约了跟踪的准确性与鲁棒性。为有效应对此类难题，提出一种新的多目标跟踪方法TBSTrack。该方法由时序预测、特征提取和分段匹配三个核心模块组成。时序预测模块通过构建时序信息缓存区，并结合自注意力机制计算当前帧的预测结果，强化目标的时空关联，从而精准预测目标位置。特征提取模块针对遮挡目标进行分块处理，运用卷积神经网络提取各分块信息，再依据遮挡状况进行拼合，有效去除干扰，实现目标特征的有效表征。分段匹配模块采用两阶段匹配策略，借助可学习锚点在匹配时恢复遗漏目标，再从背景中挖掘潜在目标，综合两者得到最终跟踪结果，以此更新时序信息。为验证方法性能，在MOT17、DanceTrack和SportsMOT数据集上展开实验，结果显示，在HOTA指标方面，该方法分别达到了63.9%、57.3%、75.6%，在IDF1指标方面，也分别取得了79.6%、56.7%、78.8%的成绩。实验结果表明，该方法显著提升了多目标跟踪的准确性和鲁棒性，尤其在复杂场景下优势明显，为多目标跟踪提供了切实有效的解决方案。

Abstract: Multi-object tracking (MOT) faces numerous challenges in the field of computer vision, such as target occlusion and appearance similarity, which significantly constrain tracking accuracy and robustness. To address these issues effectively, a new multi-object tracking method, TBSTrack, is proposed. The method consists of three core modules: temporal prediction, feature extraction, and stage-wise matching. The temporal prediction module constructs a temporal information buffer and uses a self-attention mechanism to calculate the predicted results for the current frame, enhancing the spatiotemporal association of targets and accurately predicting their positions. The feature extraction module handles occluded targets through segmentation, employing convolutional neural network (CNN) to extract features from each segment, and then merges them based on the occlusion status, effectively eliminating interference and enabling robust target feature representation. The stage-wise matching module adopts a two-stage matching strategy, utilizing learnable anchors to recover missed targets during matching and mining potential targets from the background. The final tracking results are obtained by integrating both, updating the temporal information. To evaluate the method's performance, experiments are conducted on the MOT17, DanceTrack, and SportsMOT datasets. The results show that the method achieves HOTA scores of 63.9%, 57.3%, and 75.6%, and IDF1 scores of 79.6%, 56.7%, and 78.8%, respectively. Experimental results demonstrate that the method significantly improves the accuracy and robustness of multi-object tracking, especially in complex scenarios, providing an effective solution for multi-object tracking.

林佳熔, 刘力. 基于时序预测与特征提取的多目标跟踪研究[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252164.

LIN Jiarong, LIU Li. Multi-Object Tracking Method Based on Temporal Prediction and Feature Extraction[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252164.

参考文献

[1] [2] [3] [4] [5] [6] Amosa T I, Sebastian P, Izhar L I, et al. Multi-camera multi-object tracking: A review of current trends and future advances[J]. Neurocomputing, 2023, 552: 126558. 李奕炜, 骆立实, 赵波, 等. 基于 YOLO v8 算法的变电站视频监控多目标智能跟踪方法[J]. Telecommunications Science, 2025, 41(3). Li Y W, Luo L S, Zhao B, et al. Multi-target intelligent tracking method for substation video surveillance based on YOLO v8 algorithm [J]. Telecommunications Science, 2025, 41(3). Gragnaniello D, Greco A, Saggese A, et al. Benchmarking 2D multi-object detection and tracking algorithms in autonomous vehicle driving scenarios[J]. Sensors, 2023, 23(8): 4024. Chiu H K, Wang C Y, Chen M H, et al. Probabilistic 3D multi-object cooperative tracking for autonomous driving via differentiable multi-sensor Kalman filter[C]//IEEE International Conference on Robotics and Automation (ICRA). Yokohama, Japan: IEEE, 2024: 18458-18464. Zhou X, Chan S, Qiu C, et al. Multi-target tracking based on a combined attention mechanism and occlusion sensing in a behavior-analysis system[J]. Sensors, 2023, 23(6): 2956. Li S, Schieber H, Corell N, et al. GBOT: Graph-based 3D object tracking for augmented reality-assisted assembly guidance[C]//IEEE Conference Virtual Reality and 3D User Interfaces. Orlando, FL, USA: IEEE, 2024: 513-523. [7] [8] [9] Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE Computer Society, 2016: 2129-2137. Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149. Ge Z, Liu S, Wang F, et al. YOLOX: Exceeding YOLO series in 2021[J]. CoRR, 2021. [10] Kuhn H W. The Hungarian method for the assignment problem[J]. Naval Research Logistics (NRL), 2004, 52(1): 7-21. [11] Zhang Y, Wang C, Wang X, et al. FairMOT: On the fairness of detection and re-identification in multiple object tracking[J]. International journal of computer vision, 2021, 129(11): 3069-3087. [12] Sun P, Jiang Y, Zhang R, et al. TransTrack: Multiple-object tracking with transformer[J]. CoRR, 2020. [13] Meinhardt T, Kirillov A, Leal-Taixé L, et al. TrackFormer: Multi-object tracking with transformers[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA: IEEE, 2022: 8834-8844. [14] Cai J, Xu M, Li W, et al. Memot: Multi-object tracking with memory[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans, LA, USA: IEEE, 2022: 8090-8100. [15] Zeng F, Dong B, Zhang Y, et al. MOTR: End-to-end multiple-object tracking with transformer[C]//17th European Conference on Computer Vision (ECCV). Tel Aviv, Israel: Springer, 2022: 659-675. [16] Xu Y, Ban Y, Delorme G, et al. TransCenter: Transformers with dense representations for multiple-object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7820-7835. [17] Kim C, Li F, Alotaibi M, et al. Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Virtual: IEEE, [1] [2] [3] [4] [5] [6] Amosa T I, Sebastian P, Izhar L I, et al. Multi-camera multi-object tracking: A review of current trends and future advances[J]. Neurocomputing, 2023, 552: 126558. 李奕炜, 骆立实, 赵波, 等. 基于 YOLO v8 算法的变电站视频监控多目标智能跟踪方法[J]. Telecommunications Science, 2025, 41(3). Li Y W, Luo L S, Zhao B, et al. Multi-target intelligent tracking method for substation video surveillance based on YOLO v8 algorithm [J]. Telecommunications Science, 2025, 41(3). Gragnaniello D, Greco A, Saggese A, et al. Benchmarking 2D multi-object detection and tracking algorithms in autonomous vehicle driving scenarios[J]. Sensors, 2023, 23(8): 4024. Chiu H K, Wang C Y, Chen M H, et al. Probabilistic 3D multi-object cooperative tracking for autonomous driving via differentiable multi-sensor Kalman filter[C]//IEEE International Conference on Robotics and Automation (ICRA). Yokohama, Japan: IEEE, 2024: 18458-18464. Zhou X, Chan S, Qiu C, et al. Multi-target tracking based on a combined attention mechanism and occlusion sensing in a behavior-analysis system[J]. Sensors, 2023, 23(6): 2956. Li S, Schieber H, Corell N, et al. GBOT: Graph-based 3D object tracking for augmented reality-assisted assembly guidance[C]//IEEE Conference Virtual Reality and 3D User Interfaces. Orlando, FL, USA: IEEE, 2024: 513-523. [7] [8] [9] Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE Computer Society, 2016: 2129-2137. Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149. Ge Z, Liu S, Wang F, et al. YOLOX: Exceeding YOLO series in 2021[J]. CoRR, 2021. [10] Kuhn H W. The Hungarian method for the assignment problem[J]. Naval Research Logistics (NRL), 2004, 52(1): 7-21. [11] Zhang Y, Wang C, Wang X, et al. FairMOT: On the fairness of detection and re-identification in multiple object tracking[J]. International journal of computer vision, 2021, 129(11): 3069-3087. [12] Sun P, Jiang Y, Zhang R, et al. TransTrack: Multiple-object tracking with transformer[J]. CoRR, 2020. [13] Meinhardt T, Kirillov A, Leal-Taixé L, et al. TrackFormer: Multi-object tracking with transformers[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA: IEEE, 2022: 8834-8844. [14] Cai J, Xu M, Li W, et al. Memot: Multi-object tracking with memory[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans, LA, USA: IEEE, 2022: 8090-8100. [15] Zeng F, Dong B, Zhang Y, et al. MOTR: End-to-end multiple-object tracking with transformer[C]//17th European Conference on Computer Vision (ECCV). Tel Aviv, Israel: Springer, 2022: 659-675. [16] Xu Y, Ban Y, Delorme G, et al. TransCenter: Transformers with dense representations for multiple-object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7820-7835. [17] Kim C, Li F, Alotaibi M, et al. Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Virtual: IEEE, 2023: 9901-9910. [40] Han X, Oishi N, Tian Y, et al. ETTrack: Enhanced temporal motion predictor for multi-object tracking[J]. Applied Intelligence, 2025, 55(1): 33. [41] Adzemovic M, Tadic P, Petrovic A, et al. Beyond Kalman filters: Deep learning-based filters for improved object tracking[J]. Machine Vision and Applications, 2025, 36(1): 20. [42] Huang H W, Yang C Y, Chai W, et al. MambaMOT: State-Space Model as Motion Predictor for Multi-Object Tracking[C]//ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025: 1-5.

选择文件类型/文献管理软件名称

选择包含的内容