基于深度学习的目标检测算法综述

doi:10.19678/j.issn.1000-3428.0062725

摘要/Abstract

摘要： 传统目标检测算法大多基于滑动窗口和人工特征提取，存在计算复杂度高和在复杂场景下鲁棒性差的缺点。近年来，研究人员将深度学习技术应用于目标检测领域，显著提高了算法性能。相比传统算法，基于深度学习的目标检测算法具有速度快、准确性高和在复杂条件下鲁棒性强的优点。从评价指标、公开数据集、传统算法框架等方面对目标检测任务进行阐述，按照是否存在显式的区域建议和是否定义先验锚框两种分类标准，对现有基于深度学习的目标检测算法进行分类，分别介绍算法的演进路线并总结算法机制、优势、局限性及适用场景。在此基础上，分析对比代表性算法在公开数据集中的表现，并对基于深度学习的目标检测的未来研究方向进行展望。

关键词: 目标检测, 深度学习, 卷积神经网络, 计算机视觉, 特征提取

Abstract: Most existing conventional object detection algorithms are based on sliding windows and artificial feature extraction, and exhibit disadvantages such as high computational complexity and unsatisfactory robustness under complex conditions.Recently, deep learning has been applied to object detection, bringing significant improvements to algorithm performance.Compared with conventional target detection algorithms, deep-learning-based algorithms offer high speed, accuracy and robustness under complex conditions.In this paper, we first expound upon target detection tasks in terms of their evaluation indicators, public datasets, and traditional algorithm frameworks.Then the existing deep learning-based target detection algorithms are categorized based on two criteria, whether there is an explicit region proposal and whether to define a priori anchorbox.We introduce the evolution of these algorithms, summarizing their mechanism, advantages, limits and application scenarios.On this basis, the performance of the representative algorithmson public datasets are analyzed and compared.Finally, we discuss the future directionsofresearch in deeplearning-based object detection.

Key words: object detection, deep learning, convolutional neural network, computer vision, feature extraction

中图分类号:

TP391.4

李柯泉, 陈燕, 刘佳晨, 牟向伟. 基于深度学习的目标检测算法综述[J]. 计算机工程, 2022, 48(7): 1-12.

LI Kequan, CHEN Yan, LIU Jiachen, MU Xiangwei. Survey of Deep Learning-Based Object Detection Algorithms[J]. Computer Engineering, 2022, 48(7): 1-12.

https://www.ecice06.com/CN/Y2022/V48/I7/1

图/表 9

20220806175741

20220806175748

20220806175752

20220806175755

20220806175759

20220806175802

20220806175806

20220806175809

20220806175812

参考文献

[1] 李祥兵, 陈炼.基于改进Faster-RCNN的自然场景人脸检测[J].计算机工程, 2021, 47(1):210-216. LI X B, CHEN L.Face detection in natural scene based on improved Faster-RCNN[J].Computer Engineering, 2021, 47(1):210-216.(in Chinese)
[2] 黄凯奇, 任伟强, 谭铁牛.图像物体分类与检测算法综述[J].计算机学报, 2014, 37(6):1225-1240. HUANG K Q, REN W Q, TAN T N.A review on image object classification and detection[J].Chinese Journal of Computers, 2014, 37(6):1225-1240.(in Chinese)
[3] 刘华玲, 马俊, 张国祥.基于深度学习的内容推荐算法研究综述[J].计算机工程, 2021, 47(7):1-12. LIU H L, MA J, ZHANG G X.Review of studies on deep learning-based content recommendation algorithms[J].Computer Engineering, 2021, 47(7):1-12.(in Chinese)
[4] TURK M A, PENTLAND A P.Recognition in face space[C]//Proceedings of Conference on Intelligent Robots and Computer Vision IX:Algorithms and Techniques.[S.1.]:International Society for Optics and Photonics, 1991:43-54.
[5] ZOU Z X, SHI Z W, GUO Y H, et al.Object detection in 20 years:a survey[EB/OL].[2021-08-10].https://arxiv.org/abs/1905.05055v2.
[6] VIOLA P, JONES M.Robust real-time face detection[J].Journal of Computer Vision, 2004, 57(2):137-154.
[7] VIOLA P, JONES M.Rapid object detection using a boosted cascade of simple features[C]//Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2001:1584-1598.
[8] GIRSHICK R, DONAHUE J, DARRELL T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2014:580-587.
[9] GIRSHICK R.Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:1440-1448.
[10] REN S Q, HE K M, GIRSHICK R, et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[11] HE K M, ZHANG X Y, REN S Q, et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[12] REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once:unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:779-788.
[13] ZAIDI S S A, ANSARI M S, ASLAM A, et al.A survey of modern deep learning based object detection models[J].Digital Signal Processing, 2022, 126:103514.
[14] EVERINGHAM M, ESLAMI S M A, GOOL L, et al.The pascal visual object classes challenge:a retrospective[J].International Journal of Computer Vision, 2015, 111(1):98-136.
[15] EVERINGHAM M, GOOL L, WILLIAMS C K I, et al.The pascal visual object classes challenge[J].International Journal of Computer Vision, 2010, 88(2):303-338.
[16] RUSSAKOVSKY O, DENG J, SU H, et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision, 2015, 115(3):211-252.
[17] LIN T Y, MAIRE M, BELONGIE S, et al.Microsoft COCO:common objects in context[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2014:740-755.
[18] VEDALDI A, GULSHAN V, VARMA M, et al.Multiple kernels for object detection[C]//Proceedings of the 12th IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2009:606-613.
[19] HARZALLAH H, JURIE F, SCHMID C.Combining efficient object localization and image classification[C]//Proceedings of the 12th IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2009:237-244.
[20] LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision, 2004, 60(2):91-110.
[21] LIENHART R, MAYDT J.An extended set of Haar-like features for rapid object detection[C]//Proceedings of International Conference on Image Processing.Washington D.C., USA:IEEE Press, 2002:256-267.
[22] KUANG H L, CHAN L L H, YAN H.Multi-class fruit detection based on multiple color channels[C]//Proceedings of 2015 International Conference on Wavelet Analysis and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:1-7.
[23] NAUATA N, HU H X, ZHOU G T, et al.Structured label inference for visual understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(5):1257-1271.
[24] 姜竣, 翟东海.基于空洞卷积与特征增强的单阶段目标检测算法[J].计算机工程, 2021, 47(7):232-238, 248. JIANG J, ZHAI D H.Single-stage object detection algorithm based on dilated convolution and feature enhancement[J].Computer Engineering, 2021, 47(7):232-238, 248.(in Chinese)
[25] 张泽苗, 霍欢, 赵逢禹.深层卷积神经网络的目标检测算法综述[J].小型微型计算机系统, 2019, 40(9):1825-1831. ZHANG Z M, HUO H, ZHAO F Y.Survey of object detection algorithm based on deep convolutional neural networks[J].Journal of Chinese Computer Systems, 2019, 40(9):1825-1831.(in Chinese)
[26] ZHANG S F, CHI C, YAO Y Q, et al.Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:9756-9765.
[27] UIJLINGS J R R, VAN DE S K E A, GEVERS T, et al.Selective search for object recognition[J].International Journal of Computer Vision, 2013, 104(2):154-171.
[28] FIDLER S, MOTTAGHI R, YUILLE A, et al.Bottom-up segmentation for top-down detection[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2013:3294-3301.
[29] KLEBAN J, XIE X, MA W Y.Spatial pyramid mining for logo detection in natural scenes[C]//Proceedings of 2008 IEEE International Conference on Multimedia and Expo.Washington D.C., USA:IEEE Press, 2008:1077-1080.
[30] LIN T Y, DOLLÁR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:936-944.
[31] AUBRY M, RUSSELL B C.Understanding deep features with computer-generated imagery[C]//Proceedings of 2015 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:2875-2883.
[32] BIETTI A, MAIRAL J.Invariance and stability of deep convolutional representations[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Washington D.C., USA:IEEE Press, 2017:6211-6221.
[33] DAI J F, LI Y, HE K M, et al.R-FCN:object detection via region-based fully convolutional networks[EB/OL].[2021-08-10].https://arxiv.org/abs/1605.06409.
[34] SZEGEDY C, TOSHEV A, ERHAN D.Deep neural networks for object detection[C]//Proceedings of NIPSʼ13.Cambridge, USA:MIT Press, 2013:1567-1578.
[35] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60(6):84-90.
[36] SERMANET P, EIGEN D, ZHANG X, et al.OverFeat:Integrated recognition, localization and detection using convolutional networks[C]//Proceedings of the 2nd International Conference on Learning Representations.Washington D.C., USA:IEEE Press, 2014:3246-3258.
[37] SZEGEDY C, LIU W, JIA Y Q, et al.Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:1-9.
[38] REDMON J, FARHADI A.YOLO9000:better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6517-6525.
[39] REDMON J, FARHADI A.YOLOv3:an incremental improvement[EB/OL].[2021-08-10].https://arxiv.org/abs/1804.02767.
[40] BOCHKOVSKIY A, WANG C Y, LIAO H Y M.YOLOv4:optimal speed and accuracy of object detection[EB/OL].[2021-08-10].https://arxiv.org/abs/2004.10934.
[41] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[42] LIU S, QI L, QIN H F, et al.Path aggregation network for instance segmentation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8759-8768.
[43] LIU W, ANGUELOV D, ERHAN D, et al.SSD:single shot MultiBox detector[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:21-37.
[44] SHORTEN C, KHOSHGOFTAAR T M.A survey on image data augmentation for deep learning[J].Journal of Big Data, 2019, 6:60.
[45] FU C Y, LIU W, RANGA A, et al.DSSD:deconvolutional single shot detector[EB/OL].[2021-08-10].https://arxiv.org/abs/1701.06659.
[46] LI Z, ZHOU F.FSSD:feature fusion single shot multibox detector[EB/OL].[2021-08-10].https://arxiv preprint arxiv:1712.00960.
[47] ZHANG S F, WEN L Y, BIAN X, et al.Single-shot refinement neural network for object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:4203-4212.
[48] ZHAO Q J, SHENG T, WANG Y T, et al.M2Det:a single-shot object detector based on multi-level feature pyramid network[C]//Proceedings of AAAI Conference on Artificial Intelligence.[S.1.]:AAAI Press, 2019:9259-9266.
[49] OKSUZ K, CAM B C, KALKAN S, et al.Imbalance problems in object detection:a review[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10):3388-3415.
[50] LIN T Y, GOYAL P, GIRSHICK R, et al.Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2999-3007.
[51] HUANG L C, YANG Y, DENG Y F, et al.DenseBox:unifying landmark localization with end to end object detection[EB/OL].[2021-08-10].https://arxiv.org/abs/1509.04874.
[52] TIAN Z, SHEN C H, CHEN H, et al.FCOS:a simple and strong anchor-free object detector[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(4):1922-1933.
[53] TIAN Z, SHEN C H, CHEN H, et al.FCOS:fully convolutional one-stage object detection[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:9626-9635.
[54] KONG T, SUN F C, LIU H P, et al.FoveaBox:beyound anchor-based object detection[J].IEEE Transactions on Image Processing, 2020, 29(1):7389-7398.
[55] LAW H, DENG J.CornerNet:detecting objects as paired keypoints[J].International Journal of Computer Vision, 2020, 128(3):642-656.
[56] DUAN K W, BAI S, XIE L X, et al.CenterNet:keypoint triplets for object detection[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:6568-6577.
[57] ZHOU X Y, WANG D Q, KRÄHENBÜHL P.Objects as points[EB/OL].[2021-08-10].https://arxiv.org/abs/1904. 07850.
[58] BORJI A.What is a salient object?A dataset and a baseline model for salient object detection[J].IEEE Transactions on Image Processing, 2015, 24(2):742-756.
[59] YAO Y Q, WANG Y, GUO Y, et al.Cross-dataset training for class increasing object detection[EB/OL].[2021-08-10].https://arxiv.org/abs/2001.04621.
[60] ZOPH B, CUBUK E D, GHIASI G, et al.Learning data augmentation strategies for object detection[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2020:566-583.
[61] TALUKDAR J, GUPTA S, RAJPURA P S, et al.Transfer learning for object detection using state-of-the-art deep neural networks[C]//Proceedings of the 5th International Conference on Signal Processing and Integrated Networks.Washington D.C., USA:IEEE Press, 2018:78-83.
[62] LING H, GAO J, KAR A, et al.Fast interactive object annotation with curve-GCN[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:5252-5261.
[63] ADHIKARI B, HUTTUNEN H.Iterative bounding box annotation for object detection[C]//Proceedings of the 25th International Conference on Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:4040-4046.
[64] TAN M X, LE Q V.EfficientNet:rethinking model scaling for convolutional neural networks[EB/OL].[2021-08-10].https://arxiv.org/abs/1905.11946.
[65] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all You need[EB/OL].[2021-08-10].https://arxiv.org/abs/1706.03762.
[66] WU Z H, PAN S R, CHEN F W, et al.A comprehensive survey on graph neural networks[J].IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1):4-24.
[67] 王健宗, 孔令炜, 黄章成, 等.图神经网络综述[J].计算机工程, 2021, 47(4):1-12. WANG J Z, KONG L W, HUANG Z C, et al.Survey of graph neural network[J].Computer Engineering, 2021, 47(4):1-12.(in Chinese)
[68] IANDOLA F N, HAN S, MOSKEWICZ M W, et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL].[2021-08-10].https://arxiv.org/abs/1602.07360.
[69] HOWARD A G, ZHU M L, CHEN B, et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2021-08-10].https://arxiv.org/abs/1704.04861.
[70] ZHANG X Y, ZHOU X Y, LIN M X, et al.ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:6848-6856.
[71] ELSKAN T, METZEN J H, HUTTER F.Neural architecture search:a survey[J].The Journal of Machine Learning Research, 2019, 20(1):1997-2017.
[72] HUTTER F, KOTTHOFF L, VANSCHOREN J.Automated machine learning[M].Berlin, Germany:Springer, 2019.
[73] 徐龙壮, 彭力, 朱凤增.多任务金字塔重叠匹配的行人重识别方法[J].计算机工程, 2021, 47(1):239-245, 254. XU L Z, PENG L, ZHU F Z.Pedestrian re-identification method based on multi-task pyramid overlapping matching[J].Computer Engineering, 2021, 47(1):239-245, 254.(in Chinese)
[74] CHEN X L, GUPTA A.Spatial memory for context reasoning in object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:4106-4116.
[75] CHEN Y P, ROHRBACH M, YAN Z C, et al.Graph-based global reasoning networks[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:433-442.
[76] XU B J, WONG Y, LI J N, et al.Learning to detect human-object interactions with knowledge[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:2019-2028.

选择文件类型/文献管理软件名称

选择包含的内容