基于Transformer的目标跟踪与分割统一算法

doi:10.19678/j.issn.1000-3428.0068414

摘要/Abstract

摘要：

采用相关滤波的判别式目标跟踪算法因具有较好的跟踪效果得到广泛关注, 但该类方法使用的矩形框估计法通常只能得到目标正矩形框, 难以获得目标更加精细的状态信息, 如旋转矩形框、目标轮廓、掩码信息等。为解决上述问题, 提出一种基于Transformer的单目标跟踪与分割统一算法T-TS, 利用Transformer的注意力机制优势对目标精确定位, 通过得到的目标定位编码信息引导目标分割网络对目标进行前、背景分割, 获得目标精细掩码, 再对掩码进行形态学处理, 优化得到目标的最佳旋转矩形框及其轮廓。在跟踪数据集VOT2018和分割数据集DAVIS上进行实验, 结果显示, T-TS算法与孪生网络类算法相比具有更高的鲁棒性, 与相关滤波类算法相比具有更高的跟踪精度, 其在VOT2018上期望平均重叠率指标达到0.463, 在视频分割任务上也实现了较好结果, DAVIS2016和DAVIS2017上Jaccard指标分别达到77.3和65.3, 运行速度达到34 frame/s。实验结果表明, 该算法能够准确得到旋转矩形框, 对目标进行精准预测, 有效解决目标旋转、形变等问题。

关键词: 单目标跟踪, Transformer注意力机制, 目标分割, 形态学方法, 相关滤波

Abstract:

The discriminant target tracking algorithm based on correlation filtering has received widespread attention because of its exceptional tracking effect. However, bounding box estimation for this method type typically obtains only an axis-aligned box, and it is difficult to acquire a more detailed object representation, such as the rotation bounding box, object contour, and segmentation mask. Therefore, a Transformer based unified Single Object Tracking(SOT) and segmentation algorithm called T-TS is proposed. First, we take advantage of the Transformer that attention mechanism to locate the positioning of the object precisely. Second, the location-encoded is used to guide the target segmentation network to classify the target from the background at pixel level to obtain the fine object mask. Morphological methods are subsequently applied to the mask, which optimize the most fitted rotated bounding box and the object contour. A sufficient set of experiments were conducted on the VOT2018 tracking dataset and the DAVIS segmentation dataset. The proposed T-TS algorithm was more robust than Siamese-based trackers, showed higher accuracy compared with filter-based trackers, achieved an Expected Average Overlap(EAO) index of 0.463, and a high Jaccard index for the segmentation task, DAVIS2016 77.3 and DAVIS2017 65.3, running 34 frame/s. Experimental results demonstrated that the proposed method accurately obtained a rotating rectangular frame, predicts the target, and effectively addresses the target rotation and deformation problem.

Key words: Single Object Tracking(SOT), Transformer attention mechanism, object segmentation, morphological method, correlation filtering

林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.

LIN Chang, GUO Wei, REN Zhecong, JIN Haibo. Unification Algorithm for Object Tracking and Segmentation Based on Transformer[J]. Computer Engineering, 2024, 50(9): 130-141.

https://www.ecice06.com/CN/Y2024/V50/I9/130

图/表 12

图1 Transformer目标检测算法示意图

Fig.1 Schematic diagram of Transformer based object detection algorithm

图2 T-TS算法流程图

Fig.2 Flowchart of T-TS algorithm

图3 Transformer编码器与解码器

Fig.3 Transformer encoder and decoder

图4 本文算法分割阶段

Fig.4 Segmentation stage of the proposed algorithm

图5 VOT2018数据集上的结果比较

Fig.5 Result comparison on VOT2018 dataset

图6 各算法在ants3序列上的实验结果

Fig.6 Experimental results of each algorithm on ants3 sequence

图7 各算法在gymnastics2序列上的实验结果

Fig.7 Experimental results of each algorithm on gymnastics2 sequence

图8 跟踪与分割可视化

Fig.8 Tracking and segmentation visualization

图9 实时跟踪可视化

Fig.9 Real-time tracking visualization

参考文献 33

1	冈萨雷斯. 数字图像处理第四版. 北京: 电子工业出版社, 2020.
	Gonzalez. Digital image processing fourth edition. Beijing: Electronic Industry Press, 2020.
2	韩瑞泽, 冯伟, 郭青, 等. 视频单目标跟踪研究进展综述. 计算机学报, 2022, 45(9): 1877- 1907. URL
	HAN R Z, FENG W, GUO Q, et al. Single object tracking research: a survey. Chinese Journal of Computers, 2022, 45(9): 1877- 1907. URL
3	王春雷, 张建林, 李美惠, 等. 结合卷积Transformer的目标跟踪算法. 计算机工程, 2023, 49(4): 281-288, 296. URL
	WANG C L, ZHANG J L, LI M H, et al. Object tracking algorithm combining convolution and Transformer. Computer Engineering, 2023, 49(4): 281-288, 296. URL
4	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 850-865.
5	LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8971-8980.
6	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
7	LI B, WU W, WANG Q, et al. SiamRPN++: evolution of Siamese visual tracking with very deep networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2019: 4282-4291.
8	KALAL Z, MIKOLAJCZYK K, MATAS J. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409- 1422. doi: 10.1109/TPAMI.2011.239
9	PAUL V, JONATHON L, PHILIP H S, et al. Siam R-CNN: visual tracking by re-detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2020: 6578-6588.
10	DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2019: 4660-4669.
11	BHAT G, DANELLJAN M, van GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV). Washington D. C., USA: IEEE Press, 2019: 6182-6191.
12	BHAT G, DANELLJAN M, van GOOL L, et al. Know your surroundings: exploiting scene information for object tracking[C]//Proceedings of ECCV 2020. Berlin, Germany: Springer, 2020: 205-221.
13	MAYER C, DANELLJAN M, BHAT G, et al. Transforming model prediction for tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2022: 8731-8740.
14	JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 816-832.
15	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2023-01-20]. https://arxiv.org/abs/1706.03762.
16	WANG N, ZHOU W, WANG J, et al. Transformer meets tracker: exploiting temporal context for robust visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2021: 1571-1580.
17	CHEN X, YAN B, ZHU J W, et al. Transformer tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2021: 8126-8135.
18	YAN B, PENG H W, FU J L, et al. Learning spatio-temporal transformer for visual tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV). Washington D. C., USA: IEEE Press, 2021: 10448-10457.
19	YE B T, CHANG H, MA B P, et al. Joint feature learning and relation modeling for tracking: a one-stream framework[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 341-357.
20	WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: a unifying approach[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2019: 1328-1338.
21	PINHEIRO P O, COLLOBERT R, DOLLÁR P. Learning to segment object candidates[EB/OL]. [2023-01-20]. https://arxiv.org/pdf/1612.01057.
22	TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV). Washington D. C., USA: IEEE Press, 2019: 9627-9636.
23	WANG X L, KONG T, SHEN C H, et al. SOLO: segmenting objects by locations[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 649-665.
24	CHEN X L, GIRSHICK R, HE K M, et al. TensorMask: a foundation for dense object segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV). Washington D. C., USA: IEEE Press, 2019: 2061-2069.
25	PENG S D, JIANG W, PI H J, et al. Deep snake for real-time instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2020: 8533-8542.
26	XIE E Z, SUN P Z, SONG X G, et al. PolarMask: single shot instance segmentation with polar representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2020: 12193-12202.
27	LUKEZIC A, MATAS J, KRISTAN M. D3S—a discriminative single shot segmentation tracker[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Washington D. C., USA: IEEE Press, 2020: 7133-7142.
28	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 213-229.
29	XU N, YANG L, FAN Y, et al. YouTube-VOS: a large-scale video object segmentation benchmark[EB/OL]. [2023-01-20]. https://arxiv.org/pdf/1809.03327.
30	CHEN B X, TSOTSOS J K. Fast visual object tracking with rotated bounding boxes[EB/OL]. [2023-01-20]. https://arxiv.org/pdf/1907.03892.
31	KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking vot2018 challenge results[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-53.
32	DANELLJAN M, BHAT G. PyTracking: visual tracking library based on PyTorch[EB/OL]. [2023-01-20]. https://github.com/visionml/pytracking.
33	PONT-TUSET J, PERAZZI F, CAELLES S, et al. The 2017 DAVIS challenge on video object segmentation[EB/OL]. [2023-01-20]. https://arxiv.org/pdf/1704.00675.

[1]	张朋, 严盼盼, 乔凤杰. 基于长时跟踪的滑雪教学姿态辅助矫正方法[J]. 计算机工程, 2024, 50(7): 79-86.
[2]	李富豪, 赵希梅. 基于D-Unet神经网络的鼻腔鼻窦肿瘤分割算法[J]. 计算机工程, 2022, 48(1): 281-287.
[3]	苏超群, 朱正为, 郭玉英. 基于高效卷积算子的异常抑制目标跟踪算法[J]. 计算机工程, 2021, 47(7): 266-272,288.
[4]	李志鹏, 张睿. 一种基于对抗学习的实时跟踪模型设计[J]. 计算机工程, 2021, 47(6): 262-270.
[5]	尚桠朝, 孟令军. 基于多特征和尺度估计的KCF_MTSA算法[J]. 计算机工程, 2021, 47(3): 102-108,116.
[6]	张华悦, 张顺利, 张利. 基于双阶段网络的交互式目标分割算法[J]. 计算机工程, 2021, 47(2): 300-306.
[7]	沈泽君, 丁飞飞, 杨文元. 多粒度相关滤波视频跟踪方法[J]. 计算机工程, 2020, 46(5): 274-281.
[8]	周双双, 宋慧慧, 张开华, 樊佳庆. 基于增强语义与多注意力机制学习的深度相关跟踪[J]. 计算机工程, 2020, 46(2): 279-285.
[9]	王任华, 沈剑宇, 蒋敏. 基于自适应多模型联合的目标跟踪算法[J]. 计算机工程, 2019, 45(8): 266-274.
[10]	官洪运, 欧阳江坤, 杨益伟, 吴炜. 基于特征融合的改进LCT跟踪算法[J]. 计算机工程, 2019, 45(8): 241-247.
[11]	李大湘,吴玲风,李娜,刘颖. 改进的SAMF目标跟踪算法[J]. 计算机工程, 2019, 45(2): 258-264.
[12]	魏振, 江智军, 杨晓辉, 张皓. 基于多种颜色特征结合的相关滤波器跟踪算法[J]. 计算机工程, 2019, 45(11): 256-261,268.
[13]	刘霖枫,孔繁锵,严小乐,沈秋. 相关滤波跟踪算法的特征分析[J]. 计算机工程, 2018, 44(12): 264-270.
[14]	江维创,张俊为,桂江生. 基于改进核相关滤波器的目标跟踪算法[J]. 计算机工程, 2018, 44(11): 222-227.
[15]	耿磊,袁菲,肖志涛,张芳,吴骏,李月龙. 基于面部行为分析的驾驶员疲劳检测方法[J]. 计算机工程, 2018, 44(1): 274-279.

选择文件类型/文献管理软件名称

选择包含的内容