Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2024, Vol. 50 ›› Issue (4): 187-196. doi: 10.19678/j.issn.1000-3428.0067601

• Graphics and Image Processing • Previous Articles     Next Articles

Siamese Network Tracking Algorithm Based on Compensated Attention Mechanism

Yu AN*(), Haibo GE, Wenhao HE, Sai MA, Mengyang CHENG   

  1. School of Electronic Engineering, Xi'an University of Posts and Telecommunications, Xi'an 710121, Shaanxi, China
  • Received:2023-05-11 Online:2024-04-15 Published:2023-08-17
  • Contact: Yu AN

基于补偿注意力机制的Siamese网络跟踪算法

安玉*(), 葛海波, 何文昊, 马赛, 程梦洋   

  1. 西安邮电大学电子工程学院, 陕西 西安 710121
  • 通讯作者: 安玉
  • 基金资助:
    陕西省自然科学基金(2011JM8038); 陕西省重点产业创新链(群)项目(S2019-YF-ZDCXL-0098)

Abstract:

To tackle prevalent challenges in visual object tracking, including variations in target size, motion blur, occlusion, and interference from similar objects, the Compensatory Dual Attention Mechanism (CDAM)-Siam was introduced. This Siamese network tracking algorithm leverages a compensatory attention mechanism for enhanced performance. First, the ResNet-50 network is used to construct the backbone network of the Siamese network for feature extraction at different levels, deepening the network while fully utilizing the features extracted from different layers. The CDAM-Siam algorithm integrates a compensatory dual attention network, enhancing key features and reducing-edge details to improve robustness in complex environments. Finally, a feature fusion network is constructed and added to the backbone network to effectively fuse feature maps from different levels to obtain high-resolution and informative feature maps, ultimately achieving accurate target tracking. After training the CDAM-Siam algorithm on the GOT-10K and YouTube-BB datasets, the detection was performed on the OTB100 dataset. The results showed that the tracking success rate and accuracy of CDAM-Siam were 68.3% and 89.5%, respectively. Despite challenges, the algorithm maintains strong performance, tracking at up to 56 frames per second for real-time requirements. On the VOT2018 dataset, it achieves 53.8% accuracy, 39.4% robustness, and a 26.5% Expected Average Overlap (EAO).

Key words: target tracking, Siamese network, ResNet-50 network, attention mechanism, feature fusion

摘要:

为了应对视觉目标跟踪中常见的目标尺寸变化、运动模糊、目标被遮挡、目标受相似物干扰等问题, 提出一种基于补偿注意力机制的Siamese网络跟踪算法CDAM-Siam。首先采用ResNet-50网络构建Siamese的骨干网络以进行不同层次的特征提取, 加深网络同时充分利用不同层所提取的特征; 其次在骨干网络中融入具有补偿机制的双重注意力网络CDAM, 强化特征图中的有效特征并减弱一些边缘特征, 以提高CDAM-Siam算法面对复杂场景时的鲁棒性; 最后构建特征融合网络并将其添加到主干网络中, 对来自不同层次的特征图进行有效融合以获得高分辨率和信息丰富的特征图, 最终实现准确的目标跟踪。在GOT-10K和YouTube-BB数据集上对CDAM-Siam算法进行训练后, 在OTB100数据集上进行检测, 结果表明, CDAM-Siam的跟踪成功率和精度分别达到68.3%和89.5%, 在面临跟踪任务中的常见挑战时其仍能保持较好的跟踪效果, 跟踪速度可达56帧/s, 满足实时跟踪需求; 在VOT2018数据集中的测试结果显示, 该算法的准确率、鲁棒性和平均重叠率分别可达53.8%、39.4%和26.5%。

关键词: 目标跟踪, Siamese网络, ResNet-50网络, 注意力机制, 特征融合