Multi-target tracking algorithm for aerial photography based on improved YOLOv8 and ByteTrack

doi:10.19678/j.issn.1000-3428.0252029

Abstract

Abstract: Object detection and multi-target tracking technologies are becoming increasingly mature. However, when performing aerial multi-target tracking tasks in complex scenarios, issues such as small target size, large size variation, and occlusion still lead to unsatisfactory detection and tracking performance. Therefore, this paper proposes an aerial multi-target tracking algorithm based on an improved YOLOv8 and ByteTrack (YBTrack). First, a detector (MSA-YOLO) is constructed. The original convolution in YOLOv8 is replaced with a space-depth convolution, which transforms spatial information into channel dimensions, effectively preserving target details and reducing missed and false detections caused by information loss during multi-scale feature map fusion. At the same time, a lightweight accelerated space-channel attention module is designed for neck convolution to reduce computational complexity. This module also acts as a feature refinement module before the detection head, further enhancing the ability to extract target feature information. Next, to improve tracking performance, the ByteTrack tracking model is optimized. A spatial-appearance similarity matrix (ASM) is designed to enhance the model's ability to distinguish similar targets. Additionally, a target correction function is proposed to reduce the error accumulation of the Kalman filter, decreasing target offset and loss rates. Finally, the MSA-YOLO and the optimized ByteTrack are combined for multi-target tracking experiments. MSA-YOLO achieves a 9.4% improvement in mAP_0.5 on the VisDrone2019-DET dataset. The multi-target tracking algorithm improves MOTA by 11.2% and 8.3% and IDF1 by 8.9% and 7.4% on the VisDrone2019-MOT and MOT17 datasets, respectively. Experimental results demonstrate the significant tracking performance of the proposed method. Furthermore, comparison experiments with other multi-target tracking algorithms also confirm the superiority of the proposed algorithm.

摘要： 目标检测与多目标跟踪技术日益成熟，但在复杂场景下执行航拍多目标跟踪任务时，目标尺寸小、尺寸变化大、遮挡等问题仍会导致检测与跟踪效果不理想。为此，提出一种基于改进YOLOv8与ByteTrack的航拍多目标跟踪算法（YBTrack）。首先，构建了检测器(MSA-YOLO)；设计空间-深度卷积替换YOLOv8原有卷积，将空间信息转换为通道维度，有效保留目标细节，减少了多尺度特征图融合过程中信息丢失导致的漏检、误检；同时设计轻量加速空间-通道注意力模块，用于颈部卷积，降低了计算复杂度，并作为检测头前的特征细化模块，进一步增强对目标特征信息提取的能力。然后，为提高跟踪效果，对ByteTrack跟踪模型进行优化；设计空间-外观相似度矩阵（ASM），提升了模型区分相似目标的性能；并提出目标校正函数，减少卡尔曼滤波器产生的误差积累，降低了目标偏移、丢失率。最后，将MSA-YOLO与优化后的ByteTrack结合，开展多目标跟踪实验；其中，MSA-YOLO在VisDrone2019-DET数据集上，mAP_0.5提高了9.4%；多目标跟踪算法在VisDrone2019-MOT、MOT17数据集上 MOTA分别提升了11.2%和8.3%，IDF1分别提升了8.9%和7.4%，实验结果表明本文所提方法跟踪效果显著。此外，与其他多目标跟踪算法的对比实验也证明了本文算法的优越性。

Zheng Mingyu, Shao Huichao, Shao Yanhua, Chu Hongyu. Multi-target tracking algorithm for aerial photography based on improved YOLOv8 and ByteTrack[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252029.

郑明宇, 邵慧超, 邵延华, 楚红雨. 基于改进YOLOv8与ByteTrack的航拍多目标跟踪算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252029.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252029

References

[1] 苑玉彬, 吴一全, 赵朗月, 等. 基于深度学习的无人机航拍视频多目标检测与跟踪研究进展[J]. 航空学报, 2023, 44(18): 6-36. Yuan Yubin, Wu Yiquan, Zhao Langyue, et al. Research Progress on Multi-Target Detection and Tracking in UAV Aerial Videos Based on Deep Learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(18): 6-36.
[2] Jiang P, Ergu D, Liu F, et al. A review of YOLO algorithm developments[J]. Procedia Computer Science, 2022, 199: 1066-1073.
[3] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real Time Object Detection with Region Proposal Networks[J]. IEEE. T. Pattern. Anal. Mach. Intell., 2017, 39(6): 1137-1149.
[4] 王国明, 贾代旺. 基于YOLOv8 的小目标检测模型的优化[J/OL]. 计算机工程, 1-10[2025-01-14]. Wang Guoming, Jia Daiwang. Optimization of Small Target Detection Model Based on YOLOv8[J/OL]. Computer Engineering, 1-10[2025-01-14].
[5] Li H, Qu H. DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection[J]. arxiv preprint arxiv: 2406.12285, 2024.
[6] Su J, Qin Y, Jia Z, et al. MPE-YOLO: enhanced small target detection in aerial imaging[J]. Scientific Reports, 2024, 14(1): 17799.
[7] 何植仟, 曹立杰. UAVAI-YOLO:无人机航拍图像的小目标检测模型[J]. 智能科学与技术学报, 2024, 6(02): 262-271. He Zhiqian, Cao Lijie. UAVAI-YOLO: A Small Object Detection Model for UAV Aerial Images [J]. Journal of Intelligent Science and Technology, 2024, 6(02): 262-271.
[8] 蒋凌云, 杨金龙. 检测优化的标签多伯努利视频多目标跟踪算法 [J]. 计算机科学与探索 , 2023, 17(06): 1343-1358. Jiang Lingyun, Yang Jinlong. Detection-Optimized Label Multi-Bernoulli Algorithm for Video Multi-Target Tracking[J]. Journal of Computer Science and Exploration, 2023, 17(06): 1343-1358.
[9] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE international conference on image processing (ICIP). IEEE, 2017: 3645-3649.
[10] Bewley A, Ge Z, Ott L, et al. Simple online and realtime tracking[C]//2016 IEEE international conference on image processing (ICIP). IEEE, 2016: 3464-3468.
[11] Cao J, Pang J, Weng X, et al. Observation-centric sort: Rethinking sort for robust multi-object tracking[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 9686-9696.
[12] Zhang Y, Wang C, Wang X, et al. Fairmot: On the fairness of detection and re-identification in multiple object tracking[J]. International journal of computer vision, 2021, 129: 3069-3087.
[13] Han K, Wang Y, Chen H, et al. A survey on vision transformer[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(1): 87-110.
[14] Sun P, Cao J, Jiang Y, et al. Transtrack: Multiple object tracking with transformer[J]. arxiv preprint arxiv: 2012. 15460, 2020.
[15] Zhang Y, Sun P, Jiang Y, et al. Bytetrack: Multi-object tracking by associating every detection box[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 1-21.
[16] Aharon N, Orfaig R, Bobrovsky B Z. BoT-SORT: Robust associations multi-pedestrian tracking. arxiv 2022[J]. arxiv preprint arxiv: 2206. 14651.
[17] Wang Y H, Hsieh J W, Chen P Y, et al. Smiletrack: Similarity learning for occlusion-aware multiple object tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(6): 5740-5748.
[18] You L, Chen Y, Xiao C, et al. Multi-Object Vehicle Detection and Tracking Algorithm Based on Improved YOLOv8 and ByteTrack[J]. Electronics, 2024, 13(15): 3033.
[19] Terven, J., Córdova-Esparza, D.-M., Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS[J]. Machine Learning and Knowledge Extraction, 2023, 5 (4): 1680-1716.
[20] Sunkara R, Luo T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[C]//Joint European conference on machine learning and knowledge discovery in databases. Cham: Springer Nature Switzerland, 2022: 443-459.
[21] Yun S, Ro Y. Shvit: Single-head vision transformer with memory efficient macro design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 5756-5767.
[22] Yang L, Zhang R Y, Li L, et al. Simam: A simple, parameter-free attention module for convolutional neural networks[C]//International conference on machine learning. PMLR, 2021: 11863-11874.
[23] Li J, Wen Y, He L. Scconv: Spatial and channel reconstruction convolution for feature redundancy[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 6153-6162.
[24] Li C, Zhou A, Yao A. Omni-dimensional dynamic convolution[J]. arxiv preprint arxiv: 2209. 07947, 2022.
[25] Chen J, Kao S, He H, et al. Run, don't walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 12021-12031.
[26] Wang, C.-Y., Yeh, I. H., Mark Liao, H.-Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information[C]. In Computer Vision – ECCV 2024, 2025; 1-21.
[27] Wang, A., Chen, H., Liu, L., etc. YOLOv10: Real-Time End-to-End Object Detection[C]. In NeurIPS 2024, 2024; 1-21.
[28] Huang J, Wang K, Hou Y, et al. LW-YOLO11: A Lightweight Arbitrary-Oriented Ship Detection Method Based on Improved YOLO11[J]. Sensors, 2024, 25(1): 65.
[29] Zhang Y, Xie H, Jia Y, et al. AIPT: Adaptive information perception for online multi-object tracking[J]. Knowledge-Based Systems, 2024, 285: 111369.
[30] Zhang Y, Wang T, Zhang X. Motrv2: Bootstrap** end-to-end multi-object tracking by pretrained object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 22056-22065.

Please choose a citation manager

Content to export