基于空间传播的多视图三维重建

doi:10.19678/j.issn.1000-3428.0068126

摘要/Abstract

摘要：

针对多视图三维重建任务中点云完整性欠佳的问题, 提出一种基于空间传播的多视图深度估计网络(SP-MVSNet)。引入空间传播思想用于复杂条件下的稠密点云重建, 并分别设计基于空间传播的混合深度假设策略和空间感知优化模块。混合深度假设策略采用由粗糙到精细的深度推理方式, 将深度估计视为多标签分类任务, 对正则化概率体执行交叉熵损失以约束代价体, 从而避免回归方法过拟合和收敛速度过慢的问题。空间感知优化模块从包含高级语义特征表示的特征图中获得引导, 在进行置信度检查后采用卷积空间传播网络, 通过构建亲和矩阵来细化最终的深度图。同时, 为解决大多数方法存在的对不满足多视图一致性的不可靠区域重建质量较低的问题, 进一步结合注意力机制设计具有样本自适应能力的动态特征提取网络, 用于增强模型的局部感知能力。实验结果表明, 在DTU数据集上, SP-MVSNet的重建完整性相比于CVP-MVSNet提升32.8%, 整体质量提升11.4%。在Tanks and Temples基准和BlendedMVS数据集上, SP-MVSNet的表现也优于大多数已知方法, 取得了良好的三维重建效果。

关键词: 立体视觉, 空间传播, 稠密点云重建, 注意力机制, 深度估计

Abstract:

This study proposes a Spatial Propagation-based Multi-View Stereo depth estimation Network (SP-MVSNet) to address the issue of poor point-cloud integrity in multi-view 3D reconstruction tasks. The concept of spatial propagation is introduced for dense point-cloud reconstruction under complex conditions. Hybrid depth assumption strategies and spatial perception optimization modules are designed based on spatial propagation. The hybrid depth assumption strategy adopts a deep inference approach from rough to fine, treating depth estimation as a multilabel classification task. Cross-entropy loss is applied to the regularized probability volume to constrain the cost volume. This approach helps to prevent overfitting and moderates the convergence speed of the regression methods. The spatial perception optimization module obtains guidance from feature maps containing advanced semantic feature representations. After conducting confidence checks, it uses a convolutional spatial propagation network to refine the final depth map by constructing an affinity matrix. To address the issue of low reconstruction quality in regions that lack multi-view consistency, a dynamic feature extraction network with sample adaptation ability is developed. This network incorporates an attention mechanism to enhance the model's local perception capabilities. Experimental results demonstrate that on the DTU dataset, the reconstruction integrity of SP-MVSNet improves by 32.8% compared with CVP-MVSNet, and the overall quality improves by 11.4%. On the Tanks and Temples benchmark and BlendedMVS dataset, SP-MVSNet outperformed most established methods, achieving notable 3D reconstruction results.

Key words: stereoscopic vision, spatial propagation, dense point-cloud reconstruction, attention mechanism, depth estimation

张锡英, 孙守东, 于海浩, 边继龙. 基于空间传播的多视图三维重建[J]. 计算机工程, 2024, 50(7): 293-302.

Xiying ZHANG, Shoudong SUN, Haihao YU, Jilong BIAN. Spatial Propagation-based Multi-View 3D Reconstruction[J]. Computer Engineering, 2024, 50(7): 293-302.

https://www.ecice06.com/CN/Y2024/V50/I7/293

图/表 16

图1 SP-MVSNet网络结构

Fig.1 　SP-MVSNet network structure

图2 ODConv结构

Fig.2 　ODConv structure

图3 混合深度假设策略示意图

Fig.3 　Schematic diagram of mixed depth assumption strategy

图4 SSANet结构

Fig.4 　SSANet structure

图5 置信度图

Fig.5 　Confidence map

图6 DTU数据集中Scan9的点云可视化对比

Fig.6 　Comparison of point-cloud visualization for Scan9 in DTU dataset

图7 DTU数据集中Scan12的点云可视化对比

Fig.7 　Comparison of point-cloud visualization for Scan12 in DTU dataset

图8 DTU数据集其余场景的点云可视化结果

Fig.8 　Visualization results of point-clouds for other scenes in DTU dataset

图9 Tanks and Temples基准中间集上的定性结果

Fig.9 　Qualitative results on the intermediate set of Tanks and Temples benchmark

图10 BlendedMVS数据集上的重建结果

Fig.10 　Reconstruction results on the BlendedMVS dataset

参考文献 37

1	GALLIANI S, LASINGER K, SCHINDLER K. Massively parallel multiview stereopsis by surface normal diffusion[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2015: 873-881.
2	XU Q S, KONG W H, TAO W B, et al. Multi-scale geometric consistency guided and planar prior assisted multi-view stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 16 (3): 36- 45.
3	尹晨阳, 职恒辉, 李慧斌. 基于深度学习的双目立体匹配方法综述. 计算机工程, 2022, 48 (10): 1- 12. URL
	YIN C Y, ZHI H H, LI H B. Survey of binocular stereo-matching methods based on deep learning. Computer Engineering, 2022, 48 (10): 1- 12. URL
4	YAO Y, LUO Z X, LI S W, et al. MVSNet: depth inference for unstructured multi-view stereo[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 785-801.
5	GU X D, FAN Z W, ZHU S Y, et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2020: 2495-2504.
6	刘会杰, 柏正尧, 程威, 等. 融合注意力机制和多层U-Net的多视图立体重建. 中国图象图形学报, 2022, 27 (2): 475- 485. URL
	LIU H J, BAI Z Y, CHENG W, et al. Fusion attention mechanism and multilayer U-Net for multiview stereo. Journal of Image and Graphics, 2022, 27 (2): 475- 485. URL
7	YANG J Y, MAO W, ALVAREZ J M, et al. Cost volume pyramid based depth inference for multi-view stereo[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2020: 4877-4886.
8	GAO S Y, LI Z X, WANG Z Q. Cost volume pyramid network with multi-strategies range searching for multi-view stereo[EB/OL]. [2023-06-05]. https://arxiv.org/abs/2207.12032.
9	CHENG S, XU Z X, ZHU S L, et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2020: 2524-2534.
10	李剑, 陈宇航. 一种多视角高精度图片的深度估计方法. 北京邮电大学学报, 2021, 44 (5): 101- 106. URL
	LI J, CHEN Y H. A depth estimation method for multi view and high precision images. Journal of Beijing University of Posts and Telecommunications, 2021, 44 (5): 101- 106. URL
11	CAO C, REN X, FU Y. MVSFormer: multi-view stereo with pre-trained Vision Transformers and tempe-rature-based depth[EB/OL]. [2023-06-16]. https://arxiv.org/abs/2208.02541.
12	LUO X, XIE Y P. FFP-MVSNet: feature fusion based patchmatch for multi-view stereo[EB/OL]. [2023-06-16]. https://link.springer.com/chapter/10.1007/978-981-99-1260-5_21.
13	YU Z H, GAO S H. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2020: 1949-1958.
14	PENG R, WANG R J, WANG Z Y, et al. Rethinking depth estimation for multi-view stereo: a unified representation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2022: 8635-8644.
15	YAO Y, LUO Z X, LI S W, et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2019: 5525-5534.
16	YANG J Y, ALVAREZ J M, LIU M M. Non-parametric depth distribution modelling based depth inference for multi-view stereo[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2022: 8626-8634.
17	刘万军, 王俊恺, 曲海成. 多尺度代价体信息共享的多视角立体重建网络. 中国图象图形学报, 2022, 27 (11): 3331- 3342. URL
	LIU W J, WANG J K, QU H C. Multi-scale cost volumes information sharing based multi-view stereo reconstructed model. Journal of Image and Graphics, 2022, 27 (11): 3331- 3342. URL
18	BLEYER M, RHEMANN C, ROTHER C. PatchMatch stereo-stereo matching with slanted support windows[C]//Proceedings of the British Machine Vision Conference. Washington D. C. , USA: IEEE Press, 2011: 1-11.
19	LI C, ZHOU A, YAO A. Omni-dimensional dynamic convolution[EB/OL]. [2023-06-05]. https://arxiv.org/abs/2209.07947.
20	CHENG X J, WANG P, YANG R G. Depth estimation via affinity learned with convolutional spatial propagation network[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 108-125.
21	AANæS H, JENSEN R R, VOGIATZIS G, et al. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 2016, 120 (2): 153- 168. doi: 10.1007/s11263-016-0902-9
22	KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and Temples. ACM Transactions on Graphics, 2017, 36 (4): 1- 13.
23	YAO Y, LUO Z X, LI S W, et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2020: 1790-1799.
24	PASZKE A, GROSS S, CHINTALA S, et al. Automatic differentiation in PyTorch[EB/OL]. [2023-06-05]. https://gwern.net/doc/www/openreview.net/54b149cefe1fbbe841975209b4840fa04086a701.pdf.
25	FURUKAWA Y, PONCE J. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32 (8): 1362- 1376. doi: 10.1109/TPAMI.2009.161
26	SCHONBERGER J L, FRAHM J M. Structure-from-motion revisited[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2016: 4104-4113.
27	YU A Z, GUO W Y, LIU B, et al. Attention aware cost volume pyramid based multi-view stereo network for 3D reconstruction. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175, 448- 460. doi: 10.1016/j.isprsjprs.2021.03.010
28	WANG F, GALLIANI S, VOGEL C, et al. PatchmatchNet: learned multi-view patchmatch stereo[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2021: 14194-14203.
29	XU Q, OSWALD M R, TAO W, et al. Non-local re-current regularization networks for multi-view stereo. IEEE Access, 2021, 6, 132586- 132597.
30	WEILHARTER R, FRAUNDORFER F. ATLAS-MVSNet: attention layers for feature extraction and cost volume regularization in multi-view stereo[C]//Proceedings of the 26th International Conference on Pattern Recognition. Washington D. C. , USA: IEEE Press, 2022: 3557-3563.
31	MA X J, GONG Y, WANG Q R, et al. EPP-MVSNet: epipolar-assembling based depth prediction for multi-view stereo[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2021: 5732-5740.
32	WANG L K, GONG Y, MA X J, et al. IS-MVSNet: importance sampling-based MVSNet[EB/OL]. [2023-06-05]. https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136920663.pdf.
33	WANG F, GALLIANI S, VOGEL C, et al. IterMVS: iterative probability estimation for efficient multi-view stereo[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2022: 8606-8615.
34	XI J H, SHI Y F, WANG Y J, et al. RayMVSNet: learning ray-based 1D implicit fields for accurate multi-view stereo[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2022: 8595-8605.
35	ZHANG X D, YANG F Z, CHANG M, et al. MG-MVSNet: multiple granularities feature fusion network for multi-view stereo. Neurocomputing, 2023, 528, 35- 47. doi: 10.1016/j.neucom.2023.01.062
36	LUO K Y, GUAN T, JU L L, et al. P-MVSNet: learning patch-wise matching confidence aggregation for multi-view stereo[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2019: 10452-10461.
37	CHEN R, HAN S F, XU J, et al. Visibility-aware point-based multi-view stereo network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (10): 3695- 3708.

[1]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[2]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[8]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[9]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[10]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[11]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[12]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[13]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.
[14]	王宇, 祁琦, 王纯, 许才. 储能变流器信号高精度故障诊断方法[J]. 计算机工程, 2024, 50(8): 389-396.
[15]	王炼红, 林飞鹏, 李潇瑶, 谌桂枝, 周莉. 融入课程知识图谱的KMAKT预测[J]. 计算机工程, 2024, 50(7): 23-31.

选择文件类型/文献管理软件名称

选择包含的内容