作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (7): 293-302. doi: 10.19678/j.issn.1000-3428.0068126

• 图形图像处理 • 上一篇    下一篇

基于空间传播的多视图三维重建

张锡英1, 孙守东1, 于海浩2, 边继龙1,*()   

  1. 1. 东北林业大学计算机与控制工程学院, 黑龙江 哈尔滨 150040
    2. 黑龙江工程学院计算机科学与技术学院, 黑龙江 哈尔滨 150050
  • 收稿日期:2023-07-21 出版日期:2024-07-15 发布日期:2023-11-14
  • 通讯作者: 边继龙
  • 基金资助:
    国家自然科学基金青年项目(6210114); 黑龙江省哲学社会科学研究规划项目(21TQB117)

Spatial Propagation-based Multi-View 3D Reconstruction

Xiying ZHANG1, Shoudong SUN1, Haihao YU2, Jilong BIAN1,*()   

  1. 1. School of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, Heilongjiang, China
    2. School of Computer Science and Technology, Heilongjiang University of Engineering, Harbin 150050, Heilongjiang, China
  • Received:2023-07-21 Online:2024-07-15 Published:2023-11-14
  • Contact: Jilong BIAN

摘要:

针对多视图三维重建任务中点云完整性欠佳的问题, 提出一种基于空间传播的多视图深度估计网络(SP-MVSNet)。引入空间传播思想用于复杂条件下的稠密点云重建, 并分别设计基于空间传播的混合深度假设策略和空间感知优化模块。混合深度假设策略采用由粗糙到精细的深度推理方式, 将深度估计视为多标签分类任务, 对正则化概率体执行交叉熵损失以约束代价体, 从而避免回归方法过拟合和收敛速度过慢的问题。空间感知优化模块从包含高级语义特征表示的特征图中获得引导, 在进行置信度检查后采用卷积空间传播网络, 通过构建亲和矩阵来细化最终的深度图。同时, 为解决大多数方法存在的对不满足多视图一致性的不可靠区域重建质量较低的问题, 进一步结合注意力机制设计具有样本自适应能力的动态特征提取网络, 用于增强模型的局部感知能力。实验结果表明, 在DTU数据集上, SP-MVSNet的重建完整性相比于CVP-MVSNet提升32.8%, 整体质量提升11.4%。在Tanks and Temples基准和BlendedMVS数据集上, SP-MVSNet的表现也优于大多数已知方法, 取得了良好的三维重建效果。

关键词: 立体视觉, 空间传播, 稠密点云重建, 注意力机制, 深度估计

Abstract:

This study proposes a Spatial Propagation-based Multi-View Stereo depth estimation Network (SP-MVSNet) to address the issue of poor point-cloud integrity in multi-view 3D reconstruction tasks. The concept of spatial propagation is introduced for dense point-cloud reconstruction under complex conditions. Hybrid depth assumption strategies and spatial perception optimization modules are designed based on spatial propagation. The hybrid depth assumption strategy adopts a deep inference approach from rough to fine, treating depth estimation as a multilabel classification task. Cross-entropy loss is applied to the regularized probability volume to constrain the cost volume. This approach helps to prevent overfitting and moderates the convergence speed of the regression methods. The spatial perception optimization module obtains guidance from feature maps containing advanced semantic feature representations. After conducting confidence checks, it uses a convolutional spatial propagation network to refine the final depth map by constructing an affinity matrix. To address the issue of low reconstruction quality in regions that lack multi-view consistency, a dynamic feature extraction network with sample adaptation ability is developed. This network incorporates an attention mechanism to enhance the model's local perception capabilities. Experimental results demonstrate that on the DTU dataset, the reconstruction integrity of SP-MVSNet improves by 32.8% compared with CVP-MVSNet, and the overall quality improves by 11.4%. On the Tanks and Temples benchmark and BlendedMVS dataset, SP-MVSNet outperformed most established methods, achieving notable 3D reconstruction results.

Key words: stereoscopic vision, spatial propagation, dense point-cloud reconstruction, attention mechanism, depth estimation