作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 222-230. doi: 10.19678/j.issn.1000-3428.0064268

• 图形图像处理 • 上一篇    下一篇

基于场景对象注意与深度图融合的深度估计

温静, 杨洁   

  1. 山西大学 计算机与信息技术学院, 太原 030006
  • 收稿日期:2022-03-22 修回日期:2022-05-27 发布日期:2022-08-31
  • 作者简介:温静(1982-),女,副教授、博士,主研方向为计算机视觉、图像处理、模式识别;杨洁,硕士研究生。
  • 基金资助:
    山西省基础性研究计划(201901D211176)。

Depth Estimation Based on Scene Object Attention and Depth Map Fusion

WEN Jing, YANG Jie   

  1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
  • Received:2022-03-22 Revised:2022-05-27 Published:2022-08-31

摘要: 现有单目深度估计算法主要从单幅图像中获取立体信息,存在相邻深度边缘细节模糊、明显的对象缺失问题。提出一种基于场景对象注意机制与加权深度图融合的单目深度估计算法。通过特征矩阵相乘的方式计算特征图任意两个位置之间的相似特征向量,以快速捕获长距离依赖关系,增强用于估计相似深度区域的上下文信息,从而解决自然场景中对象深度信息不完整的问题。基于多尺度特征图融合的优点,设计加权深度图融合模块,为具有不同深度信息的多视觉粒度的深度图赋予不同的权值并进行融合,融合后的深度图包含深度信息和丰富的场景对象信息,有效地解决细节模糊问题。在KITTI数据集上的实验结果表明,该算法对目标图像预估时σ<1.25的准确率为0.879,绝对相对误差、平方相对误差和对数均方根误差分别为0.110、0.765和0.185,预测得到的深度图具有更加完整的场景对象轮廓和精确的深度信息。

关键词: 场景对象注意, 加权深度图融合, 上下文信息, 深度估计, 三维重建

Abstract: The existing monocular depth estimation algorithm mainly obtains stereo information from a single image.This approach leads to blurred details of adjacent depth edges and apparent missing objects.A monocular depth estimation algorithm based on scene object attention mechanism and weighted depth map fusion is proposed.The similarity feature vector between any two positions of feature map is calculated by multiplying the feature matrix to rapidly capture the long-distance dependency relationship.The dependency between any two positions in the image can enhance the context information used to estimate the similar depth area, thus, solving the incomplete object depth information in the natural scene.Based on the advantages of multi-scale feature map fusion, weighted depth map fusion module is designed.The multi-vision granularity depth map with different depth information data is assigned different weights for fusion.The fused depth map contains depth information and rich-scene object information for effectively solving the problem of fuzzy details.The experimental results on the KITTI dataset show that an accuracy rate of the proposed algorithm for target image prediction is 0.879 at σ<1.25, and the absolute relative error, square relative error, and logarithmic root mean square error are 0.110, 0.765, and 0.185, respectively.The predicted depth map has a complete scene object contour and accurate depth information.

Key words: scene object attention, weighted depth map fusion, context information, depth estimation, three-dimensional reconstruction

中图分类号: