Siamese Network Tracking Algorithm Based on Compensated Attention Mechanism

doi:10.19678/j.issn.1000-3428.0067601

Abstract

Abstract:

To tackle prevalent challenges in visual object tracking, including variations in target size, motion blur, occlusion, and interference from similar objects, the Compensatory Dual Attention Mechanism (CDAM)-Siam was introduced. This Siamese network tracking algorithm leverages a compensatory attention mechanism for enhanced performance. First, the ResNet-50 network is used to construct the backbone network of the Siamese network for feature extraction at different levels, deepening the network while fully utilizing the features extracted from different layers. The CDAM-Siam algorithm integrates a compensatory dual attention network, enhancing key features and reducing-edge details to improve robustness in complex environments. Finally, a feature fusion network is constructed and added to the backbone network to effectively fuse feature maps from different levels to obtain high-resolution and informative feature maps, ultimately achieving accurate target tracking. After training the CDAM-Siam algorithm on the GOT-10K and YouTube-BB datasets, the detection was performed on the OTB100 dataset. The results showed that the tracking success rate and accuracy of CDAM-Siam were 68.3% and 89.5%, respectively. Despite challenges, the algorithm maintains strong performance, tracking at up to 56 frames per second for real-time requirements. On the VOT2018 dataset, it achieves 53.8% accuracy, 39.4% robustness, and a 26.5% Expected Average Overlap (EAO).

Key words: target tracking, Siamese network, ResNet-50 network, attention mechanism, feature fusion

摘要：

为了应对视觉目标跟踪中常见的目标尺寸变化、运动模糊、目标被遮挡、目标受相似物干扰等问题, 提出一种基于补偿注意力机制的Siamese网络跟踪算法CDAM-Siam。首先采用ResNet-50网络构建Siamese的骨干网络以进行不同层次的特征提取, 加深网络同时充分利用不同层所提取的特征; 其次在骨干网络中融入具有补偿机制的双重注意力网络CDAM, 强化特征图中的有效特征并减弱一些边缘特征, 以提高CDAM-Siam算法面对复杂场景时的鲁棒性; 最后构建特征融合网络并将其添加到主干网络中, 对来自不同层次的特征图进行有效融合以获得高分辨率和信息丰富的特征图, 最终实现准确的目标跟踪。在GOT-10K和YouTube-BB数据集上对CDAM-Siam算法进行训练后, 在OTB100数据集上进行检测, 结果表明, CDAM-Siam的跟踪成功率和精度分别达到68.3%和89.5%, 在面临跟踪任务中的常见挑战时其仍能保持较好的跟踪效果, 跟踪速度可达56帧/s, 满足实时跟踪需求; 在VOT2018数据集中的测试结果显示, 该算法的准确率、鲁棒性和平均重叠率分别可达53.8%、39.4%和26.5%。

关键词: 目标跟踪, Siamese网络, ResNet-50网络, 注意力机制, 特征融合

Yu AN, Haibo GE, Wenhao HE, Sai MA, Mengyang CHENG. Siamese Network Tracking Algorithm Based on Compensated Attention Mechanism[J]. Computer Engineering, 2024, 50(4): 187-196.

安玉, 葛海波, 何文昊, 马赛, 程梦洋. 基于补偿注意力机制的Siamese网络跟踪算法[J]. 计算机工程, 2024, 50(4): 187-196.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0067601

http://www.ecice06.com/EN/Y2024/V50/I4/187

Figures/Tables 14

Fig.1 SiamRPN algorithm framework

Fig.2 Framework of CDAM-Siam tracking algorithm

Fig.3 CDAM attention model

Fig.4 Visual results in OTB100

Fig.5 Channel attention network

Fig.6 Spatial attention network

Fig.7 Compensated attention mechanism model

Fig.8 Feature fusion module

Fig.9 Evaluation results of various algorithms on the OTB100 dataset

Fig.10 Tracking results of 7 algorithms in OTB100 partial video sequences

References 33

1	YUAN D L, LI Q D, YANG X H, et al. Object-aware adaptive convolution kernel attention mechanism in Siamese network for visual tracking. Applied Sciences, 2022, 12(2): 716. doi: 10.3390/app12020716
2	HUANG Y, SONG T L, LEE W J, et al. Multiple detection joint integrated track splitting for multiple extended target tracking. Signal Processing, 2019, 162, 126- 140. doi: 10.1016/j.sigpro.2019.04.015
3	ZHENG B. Soccer player video target tracking based on deep learning. Mobile Information Systems, 2022, 6, 8090871.
4	黄凯奇, 陈晓棠, 康运锋, 等. 智能视频监控技术综述. 计算机学报, 2015, 38(6): 1093- 1118. URL
	HUANG K Q, CHEN X T, KANG Y F, et al. Intelligent visual surveillance: a review. Chinese Journal of Computers, 2015, 38(6): 1093- 1118. URL
5	DAI K N, WANG D, LU H C, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2019: 4670-4679.
6	韩明, 王景芹, 王敬涛, 等. 基于孪生网络的目标跟踪研究综述. 河北科技大学学报, 2022, 43(1): 27- 41. URL
	HAN M, WANG J Q, WANG J T, et al. Comprehensive survey on target tracking based on Siamese network. Journal of Hebei University of Science and Technology, 2022, 43(1): 27- 41. URL
7	BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2019: 6182-6191.
8	HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583- 596. doi: 10.1109/TPAMI.2014.2345390
9	DANELLJAN M, KHAN F S, FELSBERG M, et al. Adaptive color attributes for real-time visual tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2014: 1090-1097.
10	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84- 90. doi: 10.1145/3065386
11	MATER A C, COOTE M L. Deep learning in chemistry. Journal of Chemical Information and Modeling, 2019, 59(6): 2545- 2559. doi: 10.1021/acs.jcim.9b00266
12	DANELLJAN M, BHAT G, KHAN F S, et al. ECO: efficient convolution operators for tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2017: 6638-6646.
13	KIM H. Multiple vehicle tracking and classification system with a convolutional neural network. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(3): 1603- 1614. doi: 10.1007/s12652-019-01429-5
14	李柯泉, 陈燕, 刘佳晨, 等. 基于深度学习的目标检测算法综述. 计算机工程, 2022, 48(7): 1- 12. URL
	LI K Q, CHEN Y, LIU J C, et al. Survey of deep learning-based object detection algorithms. Computer Engineering, 2022, 48(7): 1- 12. URL
15	TAO R, GAVVES E, SMEULDERS A W M. Siamese instance search for tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2016: 1420-1429.
16	ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of European Conference on Computer Vision. New York, USA: ACM Press, 2018: 103-119.
17	CHOPRA S, HADSELL R, LECUN Y. Learning a similarity metric discriminatively, with application to face verification[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2005: 539-546.
18	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 850-865.
19	VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2017: 2805-2813.
20	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-04-05]. https://arxiv.org/abs/1409.1556.
21	LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2018: 8971-8980.
22	LI B, WU W, WANG Q, et al. SiamRPN++: evolution of Siamese visual tracking with very deep networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2019: 4282-4291.
23	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2016: 770-778.
24	ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2019: 4591-4600.
25	DONG B, ZHUGE M C, WANG Y X, et al. Accurate camouflaged object detection via mixture convolution and interactive fusion[EB/OL]. [2023-04-05]. https://arxiv.org/abs/2101.05687.
26	孙伟, 常鹏帅, 戴亮, 等. 基于注意力引导数据增强的车型识别. 计算机工程, 2022, 48(7): 300- 306. URL
	SUN W, CHANG P S, DAI L, et al. Vehicle type recognition based on attention guided data augmentation. Computer Engineering, 2022, 48(7): 300- 306. URL
27	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2018: 7132-7141.
28	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19.
29	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2020: 11534-11542.
30	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
31	DANELLJAN M, HAGER G, KHAN F S, et al. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561- 1575.
32	BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: complementary learners for real-time tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2016: 1401-1409.
33	DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]//Proceedings of IEEE International Conference on Computer Vision Workshop. Washington D. C. , USA: IEEE Press, 2015: 58-66.

[1]	WANG Anzheng, DANG Jianwu, YUE Biao, YANG Jingyu. Road Crack Detection Based on Position Information and Attention Mechanism [J]. Computer Engineering, 2024, 50(4): 303-312.
[2]	LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia. Text-Relation-Extraction Algorithm Based on Large-Language Model and Semantic Enhancement [J]. Computer Engineering, 2024, 50(4): 87-94.
[3]	ZHANG Chi, WANG Zhong, JIANG Tianhao, XIE Kangmin. Speech Enhancement Network Based on Parallel Multi-Attention [J]. Computer Engineering, 2024, 50(4): 68-77.
[4]	Mingxu MA, Hong MA, Huawei SONG. Pose Estimation Algorithm for Small Target Pedestrians in Urban Street View Based on YOLO-Pose [J]. Computer Engineering, 2024, 50(4): 177-186.
[5]	CUI Liqun, CAO Huawei. Target Detection of Remote-Sensing Images Based on Improved YOLOv5 [J]. Computer Engineering, 2024, 50(4): 228-236.
[6]	Haipeng WU, Yurong QIAN, Hongyong LENG. Multimodal Relation Extraction Based on Bidirectional Attention Mechanism [J]. Computer Engineering, 2024, 50(4): 160-167.
[7]	LI Zhenlu, HUANG Wei, SUN Kai. Research on Lightweight Road-Target-Recognition Algorithm in Complex Environment [J]. Computer Engineering, 2024, 50(4): 219-227.
[8]	YANG Yudan, ZHANG Junhua, LIU Yunfeng. Segmentation of Spine Computed Tomography Images Based on Three-Dimensional Recurrent Residual Convolution [J]. Computer Engineering, 2024, 50(4): 237-246.
[9]	Minghu WANG, Zhikui SHI, Jia SU, Xinsheng ZHANG. Sequence Recommendation Method Based on RoBERTa and Graph-Enhanced Transformer [J]. Computer Engineering, 2024, 50(4): 121-131.
[10]	Jida ZHAO, Guoyong ZHEN, Chengqun CHU. Unmanned Aerial Vehicle Image Target Detection Algorithm Based on YOLOv8 [J]. Computer Engineering, 2024, 50(4): 113-120.
[11]	LIU Yanhong, YANG Qiuxiang, HU Shuai. Research on Multi-Scale Feature Fusion Dehazing Network Based on Feature Differences [J]. Computer Engineering, 2024, 50(4): 247-257.
[12]	Wentao YUAN, Wentao WEI, Demin GAO. Research on Multiview Convolutional Gesture Recognition with Fusion Attention Mechanism [J]. Computer Engineering, 2024, 50(3): 208-215.
[13]	Fangxin XU, Rong FAN, Xiaolu MA. Improved YOLOv7 Algorithm for Crowded Pedestrian Detection [J]. Computer Engineering, 2024, 50(3): 250-258.
[14]	Jiayuan ZHAO, Yuru ZHANG, Xiaodong SU, Hongyan XU, Shizhou LI, Yurong ZHANG. Implicit Modeling Network of Human Keypoints Based on Attention Mechanism [J]. Computer Engineering, 2024, 50(3): 317-325.
[15]	Xinlin XIE, Dongxu YIN, Taoyuan ZHANG, Gang XIE. Multiscale Fusion Crowd Counting Algorithm Based on Attention Mechanism [J]. Computer Engineering, 2024, 50(3): 290-297.

Please choose a citation manager

Content to export