作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (9): 305-313. doi: 10.19678/j.issn.1000-3428.0062296

• 开发研究与工程应用 • 上一篇    下一篇

多特征融合的端到端链式行人多目标跟踪网络

周海赟1, 项学智2, 王馨遥2, 任文凯2   

  1. 1. 南京森林警察学院 治安学院, 南京 210023;
    2. 哈尔滨工程大学 信息与通信工程学院, 哈尔滨 150001
  • 收稿日期:2021-08-03 修回日期:2021-09-20 发布日期:2022-09-08
  • 作者简介:周海赟(1980—),女,副教授、博士,主研方向为计算机视觉、模式识别;项学智,副教授;王馨遥、任文凯,硕士研究生。
  • 基金资助:
    中央高校基础科研业务费项目(LGY201802,LGZD202102);国家自然科学基金(61401113);黑龙江省科学基金项目(LH2021F011);华为MindSpore学术基金。

Chained End-to-End Pedestrian Multi-Object Tracking Network with Multi-Feature Fusion

ZHOU Haiyun1, XIANG Xuezhi2, WANG Xinyao2, REN Wenkai2   

  1. 1. Institute of Public Security, Nanjing Forest Police College, Nanjing 210023, China;
    2. School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China
  • Received:2021-08-03 Revised:2021-09-20 Published:2022-09-08

摘要: 目标检测、特征提取与数据关联作为多目标跟踪网络中重要的组件,独立或部分联合地发挥作用,这种组件分离的方法虽取得了良好的跟踪效果,但增加了跟踪网络的复杂性,影响了跟踪速度。为提升行人多目标跟踪速度及维持跟踪精度,提出一种端到端链式行人多目标跟踪网络。将目标检测、特征提取与数据关联集成到一个统一的框架中,将连续2帧图片组成一个节点作为输入,直接回归出节点之间相同目标的成对边界框,利用相邻节点之间公共帧的强相似性,仅使用交并比匹配进行数据关联,以提高跟踪速度。使用多特征融合的双向特征金字塔,并在金字塔网络中引用改进可变形卷积,提高模型对目标形变的适应性。为解决正负样本不平衡及梯度贡献的差异,将focal loss与BalancedL1 Loss组成多任务学习损失函数以促进网络的均衡学习。在MOT17数据集上的实验结果表明,与DeepSORT、TubeTK、CenterTrack等网络相比,该网络可有效实现跟踪速度与精度的平衡,多目标跟踪精度为69.6,跟踪速度保持为21.6 frame/s。

关键词: 多目标跟踪, 链式跟踪, 多特征融合, 特征金字塔, 多任务损失函数

Abstract: Object detection, feature extraction, and data association as important components in multi-target tracking network, work independently or partially jointly.Despite the improved tracking performance, separated components increase the tracking network complexity and decrease the tracking speed.An end-to-end chained network with multifeature fusion is proposed to increase the speed of pedestrian multi-object tracking while maintaining tracking accuracy.The network integrates object detection, feature extraction, and data association into a framework.Two adjacent frames form a node as the input.The network regresses the bounding box pairs of the same target in the node.The common frames across nodes have a strong correlation such that using Intersection over Union (IoU) matching for data association improves the tracking speed.In addition, the multi-feature fusion pyramid is adopted to fully integrate the high-level semantic information and low-level position information.The pyramid adopts deformable convolution v2, which increases adaptability to the deformation of objects.Focal loss and balanced L1 loss form multitask learning loss for promoting the balanced learning to improve the tracking performance, owing to the imbalance in the positive and negative samples and the differences in the gradient contributions.The experimental results for the MOT17 dataset show that compared with DeepSORT, TubeTK, CenterTrack, and other networks, this network can effectively achieve the trade-off between the tracking speed and accuracy.The tracking accuracy Mota value is 69.6, and the tracking speed is maintained at 21.6 frame/s.

Key words: multi-object tracking, chained-tracker, multi-feature fusion, feature pyramid, multi-task loss function

中图分类号: