作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (5): 314-320. doi: 10.19678/j.issn.1000-3428.0061362

• 开发研究与工程应用 • 上一篇    

基于局部注意的快速视频目标检测方法

史钰祜, 张起贵   

  1. 太原理工大学 信息与计算机学院, 山西 晋中 030600
  • 收稿日期:2021-04-02 修回日期:2021-05-14 发布日期:2021-05-31
  • 作者简介:史钰祜(1995—),男,硕士研究生,主研方向为图像处理、嵌入式系统;张起贵,教授。
  • 基金资助:
    山西省自然科学基金(2013011017-3);太原理工大学科技创新基金(9002-03011843)。

Method for Fast Video Object Detection Based on Local Attention

SHI Yuhu, ZHANG Qigui   

  1. College of Information and Computer, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
  • Received:2021-04-02 Revised:2021-05-14 Published:2021-05-31

摘要: 视频目标检测是对视频内的目标进行准确分类与定位。现有基于深度学习的视频目标检测方法通过光流传播特征,不仅存在模型参数量大的问题,而且直接将光流应用于高层特征难以建立准确的空间对应关系。提出一种轻量级的视频目标检测方法。通过设计一种特征传播模型,在不同帧的局部区域内将高层特征从关键帧传播到非关键帧,并将有限的计算资源分配给关键帧,以加快检测速度。构建动态分配关键帧模块,根据目标运动速度动态地调整关键帧选择间隔,以减少计算量并提高检测精度。在此基础上,为进一步降低最大延迟,提出异步检测模式,使得特征传播模型和关键帧选择模块协同工作。实验结果表明,该方法的检测速度和最大延迟分别为31.8 frame/s和31 ms,与基于内存增强的全局-局部聚合方法相比,其在保证检测精度的前提下,具有较快的检测速度,并且实现实时在线的视频目标检测。

关键词: 视频目标检测, 局部注意, 特征传播, 深度可分离卷积, 动态分配, 异步检测

Abstract: Video object detection is used to classify and locate targets in a video accurately.Existing video object detection methods based on deep learning propagate features through optical flow, which not only has the problem of a large number of model parameters, but also directly applies optical flow to high-level features, making it difficult to establish accurate spatial correspondence.This study proposes a lightweight video object detection method.By designing a feature propagation model that propagates high-level features from key frames to non-key frames in the local areas of different frames, it allocates limited computing resources to key frames to increase the detection speed.Based on the target motion speed, a dynamic allocation of key frame module is constructed to dynamically adjust the key frame selection interval to reduce the number of calculations and improve detection accuracy.On this basis, to further reduce the maximum delay, an asynchronous detection mode is proposed to coordinate the feature propagation and calculation of the key frames.The experimental results show that the detection speed and maximum delay of this method are 31.8 frame/s and 31 ms, respectively.Compared with the global local aggregation method based on memory enhancement, it has a faster detection speed on the premise of ensuring detection accuracy and realizes real-time online video target detection.

Key words: video object detection, local attention, feature propagation, depthwise separable convolution, dynamic allocation, asynchronous detection

中图分类号: