作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (9): 130-141. doi: 10.19678/j.issn.1000-3428.0068414

• 人工智能与模式识别 • 上一篇    下一篇

基于Transformer的目标跟踪与分割统一算法

林畅*(), 郭伟, 任哲聪, 金海波   

  1. 辽宁工程技术大学软件学院, 辽宁 葫芦岛 125105
  • 收稿日期:2023-09-20 出版日期:2024-09-15 发布日期:2024-01-31
  • 通讯作者: 林畅
  • 基金资助:
    国家自然科学基金(62173171)

Unification Algorithm for Object Tracking and Segmentation Based on Transformer

LIN Chang*(), GUO Wei, REN Zhecong, JIN Haibo   

  1. College of Software, Liaoning Technical University, Huludao 125105, Liaoning, China
  • Received:2023-09-20 Online:2024-09-15 Published:2024-01-31
  • Contact: LIN Chang

摘要:

采用相关滤波的判别式目标跟踪算法因具有较好的跟踪效果得到广泛关注, 但该类方法使用的矩形框估计法通常只能得到目标正矩形框, 难以获得目标更加精细的状态信息, 如旋转矩形框、目标轮廓、掩码信息等。为解决上述问题, 提出一种基于Transformer的单目标跟踪与分割统一算法T-TS, 利用Transformer的注意力机制优势对目标精确定位, 通过得到的目标定位编码信息引导目标分割网络对目标进行前、背景分割, 获得目标精细掩码, 再对掩码进行形态学处理, 优化得到目标的最佳旋转矩形框及其轮廓。在跟踪数据集VOT2018和分割数据集DAVIS上进行实验, 结果显示, T-TS算法与孪生网络类算法相比具有更高的鲁棒性, 与相关滤波类算法相比具有更高的跟踪精度, 其在VOT2018上期望平均重叠率指标达到0.463, 在视频分割任务上也实现了较好结果, DAVIS2016和DAVIS2017上Jaccard指标分别达到77.3和65.3, 运行速度达到34 frame/s。实验结果表明, 该算法能够准确得到旋转矩形框, 对目标进行精准预测, 有效解决目标旋转、形变等问题。

关键词: 单目标跟踪, Transformer注意力机制, 目标分割, 形态学方法, 相关滤波

Abstract:

The discriminant target tracking algorithm based on correlation filtering has received widespread attention because of its exceptional tracking effect. However, bounding box estimation for this method type typically obtains only an axis-aligned box, and it is difficult to acquire a more detailed object representation, such as the rotation bounding box, object contour, and segmentation mask. Therefore, a Transformer based unified Single Object Tracking(SOT) and segmentation algorithm called T-TS is proposed. First, we take advantage of the Transformer that attention mechanism to locate the positioning of the object precisely. Second, the location-encoded is used to guide the target segmentation network to classify the target from the background at pixel level to obtain the fine object mask. Morphological methods are subsequently applied to the mask, which optimize the most fitted rotated bounding box and the object contour. A sufficient set of experiments were conducted on the VOT2018 tracking dataset and the DAVIS segmentation dataset. The proposed T-TS algorithm was more robust than Siamese-based trackers, showed higher accuracy compared with filter-based trackers, achieved an Expected Average Overlap(EAO) index of 0.463, and a high Jaccard index for the segmentation task, DAVIS2016 77.3 and DAVIS2017 65.3, running 34 frame/s. Experimental results demonstrated that the proposed method accurately obtained a rotating rectangular frame, predicts the target, and effectively addresses the target rotation and deformation problem.

Key words: Single Object Tracking(SOT), Transformer attention mechanism, object segmentation, morphological method, correlation filtering