作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (12): 241-247,254. doi: 10.19678/j.issn.1000-3428.0063606

• 图形图像处理 • 上一篇    下一篇

结合可变形卷积与双边网格的立体匹配网络

刘振国, 李钊, 宋滕滕, 何益智   

  1. 山东理工大学 计算机科学与技术学院, 山东 淄博 255000
  • 收稿日期:2021-12-24 修回日期:2022-02-11 发布日期:2022-02-14
  • 作者简介:刘振国(1998—),男,硕士研究生,主研方向为双目立体匹配;李钊(通信作者),讲师、博士;宋滕滕、何益智,硕士研究生。
  • 基金资助:
    山东省高等学校青年创新团队发展计划(2019KJN048)。

Stereo Matching Network Combining Deformable Convolution and Bilateral Grid

LIU Zhenguo, LI Zhao, SONG Tengteng, HE Yizhi   

  1. School of Computer Science and Technology, Shandong University of Technology, Zibo, Shandong 255000, China
  • Received:2021-12-24 Revised:2022-02-11 Published:2022-02-14

摘要: 双目立体匹配被广泛应用于无人驾驶、机器人导航、增强现实等三维重建领域。在基于深度学习的立体匹配网络中采用多尺度2D卷积进行代价聚合,存在对目标边缘处的视差预测鲁棒性较差以及特征提取性能较低的问题。提出将可变形卷积与双边网格相结合的立体匹配网络。通过改进的特征金字塔网络进行特征提取,并将注意力特征增强、注意力机制、Meta-ACON激活函数引入到改进的特征金字塔网络中,以充分提取图像特征并减少语义信息丢失,从而提升特征提取性能。利用互相关层进行匹配计算,获得多尺度3D代价卷,采用2D可变形卷积代价聚合结构对多尺度3D代价卷进行聚合,以解决边缘膨胀问题,使用双边网格对聚合后的低分辨率代价卷进行上采样,经过视差回归得到视差图。实验结果表明,该网络在Scene Flow数据集中的端点误差为0.75,相比AANet降低13.8%,在KITTI2012数据集中3px的非遮挡区域误差率为1.81%,能准确预测目标边缘及小区域处的视差。

关键词: 双目视觉, 立体匹配, 双边网格, 可变形卷积, 注意力机制

Abstract: Binocular stereo matching is widely used in the field of 3D reconstruction for applications such as unmanned driving, robot navigation, and augmented reality.In a stereo matching network based on deep learning, multi-scale 2D convolution is used for cost aggregation;however, this type of network exhibits the problem of poor robustness to disparity prediction at target edge and low performance of feature extraction.Thus, a stereo matching network based on deformable convolution and a bilateral grid is proposed.An improved Feature Pyramid Network(FPN) is used for feature extraction, and attention feature enhancement, attention mechanism, and the Meta-ACON activation function are introduced to fully extract image features and reduce semantic information loss, thereby improving the performance of feature extraction.A cross correlation layer is used for matching the calculation to obtain multi-scale 3D cost volumes. The 2D deformable convolution cost aggregation structure is used to aggregate multi-scale 3D cost volumes to solve the problem of edge expansion.The bilateral grid is used to upsample the aggregated low-resolution cost volumes, and a disparity map is obtained through disparity regression.The experimental results show that the End Point Error(EPE) of the network on the Scene Flow dataset is 0.75, which is 13.8% lower than that of the AANet.The error matching rate of the 3px non-occluded area in the KITTI2012 dataset is 1.81%.It can accurately predict the disparity at target edge and the small area.

Key words: binocular vision, stereo matching, bilateral grid, deformable convolution, attention mechanism

中图分类号: