Single-Stage Object Detection Algorithm Based on Dilated Convolution and Feature Enhancement

doi:10.19678/j.issn.1000-3428.0058315

Abstract

Abstract: The shallow feature map of the object detection algorithm based on Convolutional Neural Network(CNN) lacks semantic information,while the deep feature map lacks detailed information.In order to fully exploit shallow and deep feature maps and solve the problem of multi-scale object detection,a single-stage object detection algorithm based on dilated convolution and feature enhancement is proposed.Constructed based on the Single Shot MultiBox Detector(SSD) algorithm,the proposed algorithm performs feature fusion on the two adjacent feature maps of SSD to enrich the semantic information of the shallow feature layer.Then the mechanism of parallel dilated convolution is improved.A multi-scale feature extraction module is constructed,and the fused feature map is input into the multi-scale feature extraction module.The operation not only enriches the multi-scale information of the feature map,but also enhances the feature extraction capability of the backbone network.Experimental results on the PASCAL VOC2007 test set show that the mAP of the AFE-SSD algorithm is 79.8% and a detection speed of 58.8 frame/s.Compared with SSD and DSSD algorithms,the proposed algorithm improves the mAP by 2.4 and 1.2 percentage points respectively.The effectiveness of the proposed feature fusion method and multi-scale extraction module is verified.

Key words: Convolutional Neural Network(CNN), SSD algorithm, feature fusion, dilated convolution, object detection

摘要： 基于卷积神经网络目标检测算法的浅层特征图包含丰富的细节信息，但缺乏语义信息，而深层特征图则相反。为充分利用浅层和深层特征图特征，解决多尺度目标检测问题，提出一种新的单阶段目标检测算法（AFE-SSD）。以SSD算法为基础，分别对该算法中相邻的2个特征图进行特征融合，从而丰富浅层特征层的语义信息。通过对并行空洞卷积机制进行改进，构建多尺度特征提取模块，将融合后的特征图通入多尺度特征提取模块的方式丰富其多尺度信息，同时提升主干网络的特征提取能力。在PASCAL VOC2007测试集上的实验结果表明，AFE-SSD算法的mAP为79.8%，检测速度为58.8 frame/s，与SSD、DSSD算法相比，mAP分别提升了2.4和1.2个百分点，验证了所提特征融合方式及多尺度提取模块的有效性。

关键词: 卷积神经网络, SSD算法, 特征融合, 空洞卷积, 目标检测

CLC Number:

TP391.41

JIANG Jun, ZHAI Donghai. Single-Stage Object Detection Algorithm Based on Dilated Convolution and Feature Enhancement[J]. Computer Engineering, 2021, 47(7): 232-238,248.

姜竣, 翟东海. 基于空洞卷积与特征增强的单阶段目标检测算法[J]. 计算机工程, 2021, 47(7): 232-238,248.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0058315

http://www.ecice06.com/EN/Y2021/V47/I7/232

Figures/Tables 12

References

[1] JOSHI K A,THAKORE D G.A survey on moving object detection and tracking in video surveillance system[J].International Journal of Soft Computing and Engineering,2012,2(3):44-48.
[2] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Computer Society,2016:779-788.
[3] REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Computer Society,2017:7263-7271.
[4] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Columbus,USA:IEEE Press,2014:580-587.
[5] GIRSHICK R.Fast R-CNN[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Santiago,Chile:IEEE Press,2015:1440-1448.
[6] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[7] LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot MultiBox detector[C]//Proceedings of European Conference on Computer Vision.Amsterdam,The Netherlands:Springer,2016:21-37.
[8] KONG T,YAO A,CHEN Y,et al.Hypernet:towards accurate region proposal generation and joint object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:845-853.
[9] BELL S,LAWRENCE Z C,BALA K,et al.Inside-outside Net:detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:2874-2883.
[10] FU C Y,LIU W,RANGA A,et al.DSSD:deconvolutional single shot detector[EB/OL].[2020-04-13].http://arxiv.org/abs/1701.06659.
[11] LI Z,ZHOU F.FSSD:feature fusion single shot MultiBox detector[EB/OL].[2020-04-13].http://arxiv.org/abs/1712.00960.
[12] LIN T Y,DOLLAR P,GRISHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:2117-2125.
[13] HE K,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Computer Society,2017:2961-2969.
[14] REDMON J,FARHADI A.Yolov3:an incremental improvement[EB/OL].[2020-04-13].http://arxiv.org/abs/1804.02767.
[15] ADELSON E H,ANDERSON C H,BERGEN J R,et al.Pyramid methods in image processing[J].RCA Engineer,1984,29(6):33-41.
[16] SINGH B,DAVIS L S.An analysis of scale invariance in object detection-SNIP[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:3578-3587.
[17] LI Y,CHEN Y,WANG N,et al.Scale-aware trident networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:6054-6063.
[18] DOLLÁR P,APPEL R,BELONGIE S,et al.Fast feature Pyramids for object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(8):1532-1545.
[19] HUANG G,LIU Z,LAURENS V D M,et al.Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4700-4708.
[20] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-04-13].http://arxiv.org/abs/1409.1556.
[21] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[22] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Santiago,Chile:IEEE Press,2015:1-9.
[23] RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[24] SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:761-769.
[25] DAI J,LI Y,HE K,et al.R-FCN:object detection via region-based fully convolutional networks[C]//Proceedings of NIPS'16.Barcelona,Spain:MIT Press,2016:379-387.
[26] SHEN Z,LIU Z,LI J,et al.DSOD:learning deeply supervised object detectors from scratch[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1919-1927.

Please choose a citation manager

Content to export