作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 1-15. doi: 10.19678/j.issn.1000-3428.0062227

• 热点与综述 • 上一篇    下一篇

面向视频数据的深度学习目标识别算法综述

王振华1, 李静1, 张鑫月1, 郑宗生1, 卢鹏1, 栾奎峰2   

  1. 1. 上海海洋大学 信息学院, 上海 201306;
    2. 上海海洋大学 海洋科学学院, 上海 201306
  • 收稿日期:2021-07-30 修回日期:2021-10-23 发布日期:2021-11-17
  • 作者简介:王振华(1982—),女,副教授、博士,主研方向为深度学习;李静、张鑫月,硕士研究生;郑宗生、卢鹏、栾奎峰,副教授、博士。
  • 基金资助:
    国家自然科学基金(61972240);上海市地方院校能力建设项目(19050502100);上海市海洋局科研项目(沪海科2020-05)。

Survey of Target Recognition Algorithms for Video Data Using Deep Learning

WANG Zhenghua1, LI Jing1, ZHANG Xinyue1, ZHENG Zongsheng1, LU Peng1, LUAN Kuifeng2   

  1. 1. College of Information, Shanghai Ocean University, Shanghai 201306, China;
    2. College of Marine Sciences, Shanghai Ocean University, Shanghai 201306, China
  • Received:2021-07-30 Revised:2021-10-23 Published:2021-11-17

摘要: 目标识别是计算机视觉领域的一大挑战,随着深度学习的发展,目标识别算法被广泛应用于视频数据中目标的识别和监测。对现有目标识别算法进行归纳,根据是否采用锚点机制将主流算法分为Anchor-Based和Anchor-Free两大类。针对R-CNN、SPP-Net、SSD、YOLOv2等Anchor-Based类目标识别算法,从候选框创建、特征提取和结果生成角度分析基于区域和基于回归的目标识别算法的区别和各自优势。针对CornerNet、ExtremeNet、CenterNet、FCOS等Anchor-Free类目标识别算法,从特征提取、关键点选择/层次结构和结果生成角度分析基于关键点和基于特征金字塔的目标识别算法的区别和各自优势。在此基础上,以识别效率和识别精度为评价指标,对Faster R-CNN、Mask R-CNN、SSD等8种代表性目标识别算法进行对比总结。最后,针对目标识别算法中的数据预处理耗时长、多尺度特征同步识别精度低、结构繁杂等问题,对当前研究的不足和未来研究方向进行分析和展望。

关键词: 深度学习, 目标识别, 锚定框, 候选区域, 关键点, 视频数据

Abstract: Target recognition is a big challenge in the field of computer vision.With the development of deep learning, target recognition algorithms are widely used to monitor video data.The existing target recognition algorithms can be summarized based on the existence of the anchor mechanism such that target recognition algorithms are divided into Anchor-Based and Anchor-Free.For Anchor-Based target recognition algorithms, such as R-CNN, SPP Net, SSD and YOLOv2, the differences and respective advantages of region-based and regression-based target recognition algorithms are analyzed from the perspective of creating candidate boxes, feature extraction, and result generation.In contrast, for Anchor-Free target recognition algorithms, such as CornerNet ExtremeNet, CenterNet, and FCOS, the differences and respective advantages of key point-based and feature pyramid-based target recognition algorithms are analyzed from the perspectives of feature extraction, key point selection/hierarchy and result generation.This study compares and summarizes eight representative target recognition algorithms, Fast R-CNN, Mask R-CNN and SSD, to name a few, with recognition efficiency and recognition accuracy as evaluation indices.At last, to address the problems of long computation time in data preprocessing, low accuracy of multi-scale feature synchronous recognition, and the complex structure of target recognition algorithms, which are the shortcomings of the current research, future prospects and research directions in analysis are suggested.

Key words: deep learning, object recognition, anchor box, region proposal, key point, video data

中图分类号: