作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (7): 161-168. doi: 10.19678/j.issn.1000-3428.0064969

• 图形图像处理 • 上一篇    下一篇

结合多重注意力与迭代优化的立体匹配算法

侯华, 郭宏洋*, 代超娜, 李峻辉   

  1. 河北工程大学 信息与电气工程学院, 河北 邯郸 056000
  • 收稿日期:2022-06-13 出版日期:2023-07-15 发布日期:2023-07-14
  • 通讯作者: 郭宏洋
  • 作者简介:

    侯华(1980—),女,教授、博士,主研方向为计算机视觉、移动通信、认知无线电技术、物联网

    代超娜,硕士研究生

    李峻辉,硕士研究生

  • 基金资助:
    河北省自然科学基金(F2022402001)

Stereo Matching Algorithm Combining Multiple Attention and Iterative Optimization

Hua HOU, Hongyang GUO*, Chaona DAI, Junhui LI   

  1. School of Information and Electrical Engineering, Hebei University of Engineering, Handan 056000, Hebei, China
  • Received:2022-06-13 Online:2023-07-15 Published:2023-07-14
  • Contact: Hongyang GUO

摘要:

基于深度学习的立体匹配算法具有较高的精度,但存在速度慢、显存消耗大与视差搜索范围受限等问题。为此,提出一种多重注意力和迭代优化相结合的立体匹配算法。给出Transformer结构的交叉注意力模块,聚合左右图之间的全局和局部特征信息,获取左右图之间沿极线方向的长距离依赖关系,可更加有效地融合左右图全局特征信息,并生成无视差范围限制的视差图。通过设计迭代残差优化模块,在最小尺度上依据交叉注意力模块生成无视差范围限制的稠密视差图,依据迭代方式逐步恢复视差分辨率生成稀疏代价体,并经过视差回归后估计视差残差图,在保留稠密视差图优点的同时减少计算成本和显存消耗。在此基础上,设计上下文注意力模块,捕获动、静态上下文特征信息,减少浮点运算数量和参数量,为代价聚合提供丰富的显著特征。在SceneFlow、KITTI2012和KITTI2015数据集上的实验结果表明,与主流算法AANet相比,该算法精度分别提高了0.46%、0.47%和0.25%,同时推理速度平均降低了50%。

关键词: 立体匹配, 深度估计, 注意力, Transformer结构, 残差

Abstract:

Stereo matching algorithms based on deep learning offer high accuracy, but they also present challenges such as slow processing speed, high video memory consumption, and limited disparity range.To address these issues, this paper introduces a stereo matching algorithm that combines multiple attention and iterative optimization techniques.In this algorithm, a cross-attention module is proposed using a Transformer structure to aggregate richer global and local feature information between left and right images.This enables the algorithm to capture long-distance dependencies along the polar direction and more effectively fuse global feature information from both images.Additionally, the algorithm generates a disparity map that disregards the limitations of the disparity range.In this study, an iterative residual optimization module is designed, which generates a dense disparity map by leveraging the cross-attention module at the smallest scale.This module iteratively refines the disparity range by gradually restoring sparse cost estimates of disparity residuals, thus retaining the advantages of ignoring the disparity range limit while reducing computational cost and memory consumption. Furthermore, a context attention module is developed to capture dynamic and static context feature information, minimize the number of floating-point operations and parameters, and provide rich salient features for cost aggregation.Experimental results on the SceneFlow, KITTI2012, and KITTI2015 datasets demonstrate that the proposed algorithm improves accuracy by 0.46%, 0.47%, and 0.25%, respectively, and reduces inference speed by 50% when compared to the recent mainstream algorithm, AANet.

Key words: stereo matching, depth estimation, attention, Transformer structure, residual