Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (11): 262-267,291. doi: 10.19678/j.issn.1000-3428.0059516

• Graphics and Image Processing • Previous Articles     Next Articles

Method for Estimating Monocular Image Depth Based on Dense Convolutional Network

WANG Yaqun, DAI Hualin, WANG Li, LI Guoyan   

  1. School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China
  • Received:2020-09-14 Revised:2020-10-22 Published:2020-11-05

基于密集卷积网络的单目图像深度估计方法

王亚群, 戴华林, 王丽, 李国燕   

  1. 天津城建大学 计算机与信息工程学院, 天津 300384
  • 作者简介:王亚群(1995-),女,硕士,主研方向为图像处理、计算机视觉;戴华林,教授;王丽,讲师、硕士;李国燕,讲师、博士。
  • 基金资助:
    天津市自然科学基金(17JCQNJC00500)。

Abstract: To address the low accuracy and complex network structure of the existing methods for estimating monocular image depth,a dense convolutional network for estimating monocular image depth is proposed,which adopts an end-to-end encoder and decoder.Dense convolutional Network(DenseNet) is introduced into the encoder,and the output of each previous layer is taken as the input of this layer,which enhances feature reuse and forward propagation while reducing the number of parameters and network computation,thus avoiding the occurrence of gradient disappearance to a certain extent.The decoder adopts the upper projection module with cavity convolution and the bilinear upper sampling module to better express the image features extracted by the encoder,and finally obtain the estimated depth map corresponding to the input image.The proposed network is trained,verified and tested on NYU Depth V2,an indoor scene depth data set.The results show that the proposed dense convolutional network structure achieves an accuracy of 0.851 and a Root Mean Square Error(RMSE) of 0.482 in the case of δ<1.25.

Key words: dense convolutional network, monocular image, encoder, decoder, depth estimation

摘要: 为解决目前单目图像深度估计方法存在的精度低、网络结构复杂等问题,提出一种密集卷积网络结构,该网络采用端到端的编码器和解码器结构。编码器引入密集卷积网络DenseNet,将前面每一层的输出作为本层的输入,在加强特征重用和前向传播的同时减少参数量和网络计算量,从而避免梯度消失问题发生。解码器结构采用带有空洞卷积的上投影模块和双线性插值模块,以更好地表达由编码器所提取的图像特征,最终得到与输入图像相对应的估计深度图。在NYU Depth V2室内场景深度数据集上进行训练、验证和测试,结果表明,该密集卷积网络结构在δ<1.25时准确率达到0.851,均方根误差低至0.482。

关键词: 密集卷积网络, 单目图像, 编码器, 解码器, 深度估计

CLC Number: