多层级特征融合结构的单目图像深度估计网络

doi:10.19678/j.issn.1000-3428.0056477

摘要/Abstract

摘要： 采用卷积神经网络对单目图像的深度进行估计时，存在深度信息不精确、边缘模糊以及细节缺失等问题。为此，提出一种多层级特征融合结构的深度卷积网络。该网络采用端到端的编-解码器结构，编码器使用ResNet101网络结构将图像转换为高维特征图，解码器使用上采样卷积模块从高维特征图中重建出深度图像，并对编码器与解码器中的不同层级特征进行融合。基于NYUv2数据集与KITTI数据集的实验结果表明，相比其他先进网络，该网络不仅能预测出更加准确的深度信息，而且能保持预测深度图像的边缘信息。

关键词: 单目图像, 深度估计, 编-解码器结构, 多层级融合, 亚像素卷积

Abstract: The monocular image depth estimation based on Convolutional Neural Network(CNN) is faced with inaccurate depth information,fuzzy edge and missing details.Therefore,this paper proposes a deep convolutional network with multiple level feature fusion structure.The network adopts the end-to-end encoder and decoder structure.The encoder uses ResNet101 network structure to convert the image into a high-dimensional feature map.The decoder uses an up-sampling convolution module to reconstruct a depth image from the high-dimensional feature map,and fuses the features of different levels in the encoder and decoder.The experimental results on the NYUv2 dataset and KITTI dataset show that compared with other advanced networks,the network can not only predict more accurate depth information,but also keep the edge information of the predicted depth image.

Key words: monocular image, depth estimation, encoder and decoder structure, multiple level fusion, sub-pixel convolution

中图分类号:

TP391

贾瑞明, 李阳, 李彤, 崔家礼, 王一丁. 多层级特征融合结构的单目图像深度估计网络[J]. 计算机工程, 2020, 46(12): 207-214.

JIA Ruiming, LI Yang, LI Tong, CUI Jiali, WANG Yiding. Monocular Image Depth Estimation Network with Multiple Level Feature Fusion Structure[J]. Computer Engineering, 2020, 46(12): 207-214.

http://www.ecice06.com/CN/Y2020/V46/I12/207

图/表 16

20201216141951

20201216141958

20201216142003

20201216142007

20201216142010

20201216142015

20201216142018

20201216142021

20201216142025

20201216142029

20201216142032

20201216142035

20201216142039

20201216142042

20201216142046

20201216142049

参考文献

[1] HU Zhengyi,TAN Qingchang,SUN Qiucheng.RGB-D based indoor scene real-time 3D reconstruction algorithm[J].Journal of Northeastern University (Natural Science),2017,38(12):1764-1768.(in Chinese)胡正乙,谭庆昌,孙秋成.基于RGB-D的室内场景实时三维重建算法[J].东北大学学报(自然科学版),2017,38(12):1764-1768.
[2] TIAN Xuan,WANG Liang,DING Qi.Review of image semantic segmentation based on deep learning[J].Journal of Software,2019,30(2):440-468.(in Chinese)田萱,王亮,丁琪.基于深度学习的图像语义分割方法综述[J].软件学报,2019,30(2):440-468.
[3] WANG Xin,WU Shiqian,ZOU Mi.Design of robot picking fruit and vegetable system based on with Kinect sensor[J].Journal of Agricultural Mechanization Research,2018,40(10):199-202,207.(in Chinese)王欣,伍世虔,邹谜.基于Kinect的机器人采摘果蔬系统设计[J].农机化研究,2018,40(10):199-202,207.
[4] YE Yutong,LI Bijun,FU Liming.Fast object detection and tracking in laser data for autonomous driving[J].Geomatics and Information Science of Wuhan University,2019,44(1):139-144,152.(in Chinese)叶语同,李必军,付黎明.智能驾驶中点云目标快速检测与跟踪[J].武汉大学学报(信息科学版),2019,44(1):139-144,152.
[5] XU Ming,ZHAO Rongchun.Solving self-shadow problem of shape from shading in light source projected system[J].Journal of Image and Graphics,2018,23(1):64-68.(in Chinese)须明,赵荣椿.利用光源投影坐标系处理SFS中的自阴影问题[J].中国图象图形学报,2018,23(1):64-68.
[6] LV Jinpu.Research on key techniques of 3D shape restoration based on texture information[D].Tianjin:Tiangong University,2011.(in Chinese)吕晋普.基于纹理信息的三维形状恢复关键技术研究[D].天津:天津工业大学,2011.
[7] LV Niqi,SONH Guanghua,YANG Bowei.Semi-global stereo matching algorithm based on feature fusion and its CUDA implementation[J].Journal of Image and Graphics,2018,23(6):874-886.(in Chinese)吕倪祺,宋广华,杨波威.特征融合的双目半全局匹配算法及其并行加速实现[J].中国图象图形学报,2018,23(6):874-886.
[8] WANG Yunfeng,WU Wei,YU Xiaoliang,et al.A stereo matching system with the adaptive weight AD-Census[J].Journal of Sichuan University(Engineering Science Edition),2018,50(4):153-160.(in Chinese)王云峰,吴炜,余小亮,等.基于自适应权重AD-Census变换的双目立体匹配[J].四川大学学报(工程科学版),2018,50(4):153-160.
[9] EIGEN D,PUHRSCH C,FERGUS R.Depth map prediction from a single image using a multi-scale deep network[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.Cambridge,USA:MIT Press,2014:2366-2374.
[10] EIGEN D,FERGUS R.Predicting depth,surface normals and semantic labels with a common multi-scale convolutional architecture[EB/OL].[2019-09-29].https://arxiv.org/pdf/1411.4734v4.pdf.
[11] LAINA I,RUPPRECHT C,BELAGIANNIS V,et al.Deeper depth prediction with fully convolutional residual networks[EB/OL].[2019-09-29].https://arxiv.org/abs/1606.00373.
[12] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:1-9.
[13] LI J,KLEIN R,YAO A.A two-streamed network for estimating fine-scaled depth maps from single RGB images[EB/OL].[2019-09-29].https://arxiv.org/abs/1607.00730.
[14] LI Bo,DAI Yuchao,CHEN Huahui,et al.Single image depth estimation by dilated deep residual convolutional neural network and soft-weight-sum inference[EB/OL].[2019-09-29].https://arxiv.org/abs/1705.00534.
[15] LEE J H,HEO M,KIM K R,et al.Single-image depth estimation based on Fourier domain analysis[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:330-339.
[16] LUO W J,SCHWING A G,URTASUN R.Efficient deep learning for stereo matching[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:5695-5703.
[17] SHAKED A,WOLF L.Improved stereo matching with constant highway networks and reflective confidence learning[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1-12.
[18] KUZNIETSOV Y,STVCKLER J,LEIBE B.Semi-supervised deep learning for monocular depth map prediction[EB/OL].[2019-09-29].https://arxiv.org/abs/1702.02706.
[19] YAO Yao,LUO Zixin,LI Shiwei,et al.MVSNet:depth inference for unstructured multi-view stereo[EB/OL].[2019-09-29].https://www.researchgate.net/publica tion/324387746_MVSNet_Depth_Inference_for_Unstructured_Multi-view_Stereo.
[20] LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramid networks for object detection[EB/OL].[2019-09-29].https://arxiv.org/abs/1612.03144.
[21] SILBERMAN N,HOIEM D,KOHLI P,et al.Indoor segmentation and support inference from RGBD images[M].Berlin,Germany:Springer,2012:746-760.
[22] GEIGER A,LENZ P,STILLER C,et al.Vision meets robotics:the KITTI dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.
[23] CHAKRABARTI A,SHAO J Y,SHAKHNAROVICH G.Depth from a single image by harmonizing overcomplete local network predictions[EB/OL].[2019-09-29].https://arxiv.org/abs/1605.07081.
[24] JIA Ruiming,LIU Liqiang,LIU Shengjie,et al.Single image depth estimation based on encoder-decoder convolution neural network[J].Journal of Graphics,2019,40(4):718-724.(in Chinese)贾瑞明,刘立强,刘圣杰,等.基于编解码卷积神经网络的单张图像深度估计[J].图学学报,2019,40(4):718-724.
[25] XU D,RICCI E,OUYANG W,et al.Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:5354-5362.
[26] GARG R,VIJAY K B G,CARNEIRO G,et al.Unsupervised CNN for single view depth estimation:geometry to the rescue[M].Berlin,Germany:Springer,2016:740-756.
[27] WANG C Y,BUENAPOSADA J M,ZHU R,et al.Learning depth from monocular videos using direct methods[EB/OL].[2019-09-29].https://arxiv.org/abs/1712.00175.
[28] XU Dan,WANG Wei,TANG Hao,et al.Structured attention guided convolutional neural fields for monocular depth estimation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:3917-3925.
[29] LIU F Y,SHEN C H,LIN G S.Deep convolutional neural fields for depth estimation from a single image[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:5162-5170.
[30] SHI W Z,CABALLERO J,HUSZÁR F,et al.Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[EB/OL].[2019-09-29].https://arxiv.org/abs/1609.05158.

选择文件类型/文献管理软件名称

选择包含的内容