A High-Fidelity 3D Reconstruction Method for Cultural Relics Based on Segmentation Prior and 3DGS
Qinghao LIANG1, Hongjuan GAO1,2*

doi:10.19678/j.issn.1000-3428.0EC0260329

Abstract

Abstract: Cultural relic 3D reconstruction is an important technical support for the digital preservation, virtual exhibition, and digital restoration of cultural heritage. Compared with modeling approaches such as structured-light scanning and laser scanning, which rely on specialized equipment and controlled acquisition environments, multi-view image-based 3D reconstruction methods have the advantages of low acquisition cost, flexible operation, and low deployment requirements, making them more suitable for cultural relic digitization in museum exhibition spaces. However, images captured in real museum collection scenes are often affected by complex backgrounds, glass reflections, uneven illumination, local occlusions, and limited shooting viewpoints. As a result, the target relic is highly intertwined with display platforms, walls, and other background regions in image space. Although the original 3D Gaussian Splatting (3DGS) method can achieve efficient training and real-time rendering through explicit Gaussian primitives, it is mainly designed for complete scene modeling and lacks a semantic focusing mechanism for cultural relic subjects. Consequently, redundant background point clouds and non-target Gaussians are likely to participate in optimization, increasing GPU memory consumption, training time, and model size. In addition, abnormal elongation and artifacts may occur around object boundaries, affecting the stable representation of the geometric shape and texture details of cultural relics. To improve the accuracy and efficiency of cultural relic subject reconstruction in complex museum collection environments, a high-fidelity 3D reconstruction method based on segmentation priors and 3DGS is proposed, in which two-dimensional subject segmentation results are introduced into the 3D Gaussian modeling process. The Segment Anything Model is used to generate subject masks of cultural relics from multi-view images. Combined with camera poses and sparse point clouds estimated by SfM, 3D points are projected onto the corresponding mask planes. Points that consistently fall into background regions are removed according to multi-view semantic consistency, thereby obtaining cleaner and more compact subject point clouds from the initialization stage. During Gaussian optimization, a mask-guided constraint is introduced to restrict the color reconstruction loss to the cultural relic target region, enabling parameter updates to focus on the subject geometry, surface texture, and local details while reducing the interference of background regions in the optimization process. To address abnormal elongation of Gaussian ellipsoids caused by insufficient sampling and depth discontinuities near cultural relic contours, an edge pruning strategy based on geometric morphological constraints is designed. Morphologically abnormal Gaussian primitives near object boundaries are identified and removed according to the major-to-minor axis ratio, suppressing “black spike” artifacts and edge noise diffusion while enhancing the continuity, compactness, and visual stability of subject boundaries. Experimental results on public datasets, including Tanks&Temples, Mip-NeRF 360, LERF, and LLFF, as well as a self-built cultural relic dataset, demonstrate that the proposed method achieves favorable overall performance in reconstruction accuracy, structural consistency, and perceptual quality. On the public datasets, the average PSNR, SSIM, and LPIPS reach 32.99 dB, 0.977, and 0.026, respectively. On the self-built cultural relic dataset, the average PSNR, SSIM, and LPIPS reach 35.48 dB, 0.983, and 0.027, respectively. Compared with the original 3DGS and related methods, including LightGaussian, 3DGSR, 2DGS, Perceptual-GS, and FCGS, the proposed method produces clearer subject contours and more stable texture representations under complex background conditions. Resource consumption comparisons and ablation experiments show that segmentation prior-guided point cloud filtering and edge pruning can jointly reduce redundant background Gaussians and alleviate contour artifacts, while significantly lowering training costs without compromising reconstruction quality. Compared with the original 3DGS, the training time is reduced by approximately 60%, GPU memory consumption by approximately 40%, and model size by approximately 50%, providing a feasible solution for low-cost, efficient, and high-fidelity 3D reconstruction of museum cultural relics under uncontrolled acquisition conditions.

摘要： 文物三维重建是文化遗产数字化保护、虚拟展示与数字修复的重要技术支撑。相比结构光扫描、激光扫描等依赖专业设备和受控环境的建模方式，基于多视角图像的重建方法具有采集成本低、操作灵活、部署门槛低等特点，更适用于博物馆展陈空间中的文物数字化采集。真实馆藏场景通常存在背景复杂、玻璃反射、光照不均、局部遮挡和拍摄视角受限等问题，目标文物与展台、墙面及其他背景区域在图像中相互混杂。原始三维高斯泼溅（3D Gaussian Splatting，3DGS）虽能通过显式高斯核实现高效训练与实时渲染，但其主要面向完整场景建模，缺乏面向文物主体的语义聚焦机制，易使冗余背景点云和非目标高斯参与优化，进而增加显存占用、训练耗时和模型规模，并在主体边界处产生异常拉伸和伪影，影响文物几何形态与纹理细节的稳定表达。为提升复杂馆藏环境下文物主体重建的准确性与效率，提出基于分割先验与3DGS的高保真三维重建方法，将二维主体分割结果引入三维高斯建模过程。利用Segment Anything Model生成多视角图像中的文物主体掩码，并结合SfM估计的相机位姿与稀疏点云，将三维点投影至对应视角的掩码平面，依据多视图语义一致性筛除稳定落入背景区域的点云，从初始化阶段获得更加纯净、紧凑的主体点云。高斯优化过程中引入掩码引导约束，将颜色重建损失限制于文物目标区域，使参数更新集中于主体几何结构、表面纹理和局部细节，降低背景区域对优化过程的干扰。面向文物轮廓处采样不足和深度不连续引起的高斯椭球异常拉伸问题，设计基于几何形态约束的边缘裁剪策略，通过长短轴比判定并删除边界附近形态异常的高斯核，抑制“黑刺”伪影和边缘噪声扩散，增强主体边界的连续性、紧致性和视觉稳定性。在Tanks&Temples、Mip-NeRF 360、LERF、LLFF等公共数据集以及自建馆藏文物数据集上的实验结果表明，该方法在重建精度、结构一致性和感知质量方面均具有较好的综合性能。公共数据集上的平均PSNR为32.99 dB，平均SSIM为0.977，平均LPIPS为0.026；自建文物数据集上的平均PSNR为35.48 dB，平均SSIM为0.983，平均LPIPS为0.027。与原始3DGS及LightGaussian、3DGSR、2DGS、Perceptual-GS、FCGS等方法相比，该方法能够在复杂背景条件下获得更加清晰的主体轮廓和更加稳定的纹理表达。资源消耗对比与消融实验表明，分割先验引导的点云过滤和边缘裁剪能够协同减少背景冗余高斯并改善轮廓伪影，在保持重建质量的同时显著降低训练成本。相较于原始3DGS，训练时间缩短约60%，显存占用降低约40%，模型体积减少约50%，为非受控采集条件下馆藏文物的低成本、高效率和高保真三维重建提供了可行方案。

Qinghao LIANG, Hongjuan GAO. A High-Fidelity 3D Reconstruction Method for Cultural Relics Based on Segmentation Prior and 3DGS Qinghao LIANG1, Hongjuan GAO1,2*[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0EC0260329.

梁清豪, 高宏娟. 基于分割先验与3DGS的文物高保真三维重建方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0EC0260329.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0EC0260329

References

[1] Lei W, Dunqiang L, Jiaqing T, et al. Single-shot structured light projection profilometry with SwinConvUNet[J]. Optical Engineering, 2022, 61(11): 114101-114101.
[2] Yiğit A Y, Gamze Hamal S N, Ulvi A, et al. Comparative analysis of mobile laser scanning and terrestrial laser scanning for the indoor mapping[J]. Building Research & Information, 2024, 52(4): 402-417.
[3] Cao, C., Ren, X., & Fu, Y. Mvsformer: Learning robust image representations via transformers and temperature-based depth for multi-view stereo. IEEE Transactions on Image Processing, 2022, 31, 1234-1245.
[4] GOESELE M, CURLESS B, SEITZ S M. Multiview stereo revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006: 2402-2409.
[5] 陈坤,刘新国.基于多视图三维重建方法[J].计算机工程,2013,39(11):235-239. CHEN K, LIU X G. Global optimized multi-view 3D reconstruction method based on rays [J]. Computer Engineering, 2013, 39(11): 235-239.
[6] Mildenhall B, Srinivasan P P, Tancik M, et al. Nerf: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[7] Kerbl B, Kopanas G, Leimkühler T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Trans. Graph., 2023, 42(4): 139:1-139:14.
[8] 黄开基,杨华.基于深度学习特征的二维图像匹配算法综述[J].计算机工程,2024,50(10):16-34. HUANG K J, YANG H. Survey of two-dimensional image matching algorithms based on deep learning features[J]. Computer Engineering ,2024,50(10):16-34.
[9] Huang H, Wu Y, Zhou J, et al. NeuSurf: On-surface priors for neural surface reconstruction from sparse input views[C]//Proceedings of the AAAI conference on artificial intelligence. 2024, 38(3): 2312-2320.
[10] 孔庆群,吴福朝,樊彬.基于深度学习的图像匹配:方法、应用与挑战[J].计算机学报,2024,47(07):1485-1520. KONG Q Q, WU F C, FANG B. Image Matching in Deep Learning Era: Methods,Applications and Challenges. Chinese Journal of Computers. 2024,47(07):1485-1520.
[11] Wang W, Gao F, Shen Y. Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction[J]. Sensors, 2024, 24(3): 881.
[12] Ravi, N., Gabeur, V., Hu, Y. T., et al. Sam2: Segment anything in images and videos[C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 123-134.
[13] Barron J T, Mildenhall B, Tancik M, et al. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2021: 5855-5864.
[14] Müller T, Evans A, Schied C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM transactions on graphics (TOG), 2022, 41(4): 1-15.
[15] Chen A, Xu Z, Geiger A, et al. Tensorf: Tensorial radiance fields[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 333-350.
[16] Zhou, H., & Ni, Z. Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting[C]. In Proceedings of the Forty-second International Conference on Machine Learning ICML 2025.
[17] Fan Z, Wang K, Wen K, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps[J]. Advances in neural information processing systems, 2024, 37: 140138-140158.
[18] Chen Y, Wu Q, Li M, et al. Fast Feedforward 3D Gaussian Splatting Compression[C]//Proceedings of the Thirteenth International Conference on Learning Representations. 2025.
[19] Lyu X, Sun Y T, Huang Y H, et al. 3dgsr: Implicit surface reconstruction with 3d gaussian splatting[J]. ACM Transactions on Graphics (TOG), 2024, 43(6): 1-12.
[20] Huang B, Yu Z, Chen A, et al. 2d gaussian splatting for geometrically accurate radiance fields[C]//ACM SIGGRAPH 2024 conference papers. 2024: 1-11.
[21] Min Z, Luo Y, Sun J, et al. Epipolar-free 3d gaussian splatting for generalizable novel view synthesis[J]. Advances in Neural Information Processing Systems, 2024, 37: 39573-39596.
[22] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[23] Huang Z, Xu M, Perry S. DET-GS: Depth-and Edge-Aware Regularization for High-Fidelity 3D Gaussian Splatting[J]. arXiv preprint arXiv:2508.04099, 2025.
[24] Qi C R, Su H, Mo K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 652-660.
[25] Thomas H, Qi C R, Deschaud J E, et al. Kpconv: Flexible and deformable convolution for point clouds[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6411-6420.
[26] Graham B, Engelcke M, Van Der Maaten L. 3d semantic segmentation with submanifold sparse convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 9224-9232.
[27] Zhi S, Laidlow T, Leutenegger S, et al. In-place scene labelling and understanding with implicit scene representation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 15838-15847.
[28] Fu X, Zhang S, Chen T, et al. Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation[C]//2022 International Conference on 3D Vision (3DV). IEEE, 2022: 1-11.
[29] Ren Z, Agarwala A, Russell B, et al. Neural volumetric object selection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 6133-6142.
[30] Cen J, Fang J, Yang C, et al. Segment any 3d gaussians[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(2): 1971-1979.
[31] Zhou S, Chang H, Jiang S, et al. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 21676-21685.
[32] Zhang J, Jiang J, Chen Y, et al. Cob-gs: Clear object boundaries in 3dgs segmentation based on boundary-adaptive gaussian splitting[C]//Proceedings of the Computer Vision and Pattern Recognition Conference. 2025: 19335-19344.
[33] Ye M, Danelljan M, Yu F, et al. Gaussian grouping: Segment and edit anything in 3d scenes[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2024: 162-179.
[34] 王锋,银莹,王佳炎,等.基于高斯泼溅的轻量级重建场景分割方法[J].计算机学报,2025,48(05):1232-1243. WANG F, YIN Y, WANG J Y, et al. Object Segmentation in 3D Reconstructed Scenes Based on Gaussian Splatting. Chinese Journal of Computers. 2025,48(05):1232-1243.
[35] Choi S, Song H, Kim J, et al. Click-gaussian: Interactive segmentation to any 3d gaussians[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 289-305.
[36] Knapitsch A, Park J, Zhou Q Y, et al. Tanks and temples: Benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics (ToG), 2017, 36(4): 1-13.
[37] Barron J T, Mildenhall B, Verbin D, et al. Mip-nerf 360: Unbounded anti-aliased neural radiance fields[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 5470-5479.
[38] Kerr J, Kim C M, Goldberg K, et al. Lerf: Language embedded radiance fields[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 19729-19739.
[39] Mildenhall B, Srinivasan P P, Ortiz-Cayon R, et al. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics (ToG), 2019, 38(4): 1-14.

Please choose a citation manager

Content to export