基于自注意力机制和动态掩膜机制的文物图像修复方法

doi:10.19678/j.issn.1000-3428.0070181

摘要/Abstract

摘要：

卷积网络在文物修复中由于卷积核的局部感受野对于全局上下文和复杂结构的理解较弱, 又因卷积操作的平移不变性对文物表面复杂的几何形态处理不充分, 在进行文物图像修复时容易出现无关结构和伪影等问题。具有自注意力机制的Transformer模型在处理文物图像的细节和局部特征时, 对特定区域的细节关注不足, 难以获取足够的深层特征, 从而影响修复的精度和细腻度, 对图像的远距离语义获取不充分, 导致修复图像的直观视觉性不足。提出了一种基于SwinTransformer的文物图像修复模型DMSWT。该模型通过对网络中的自注意力模块进行多项改进以优化网络结构。首先删除层归一化, 且用残差连接替换全连接层, 提高网络的深层特征提取能力; 其次引入动态掩膜机制, 缓解修复大规模缺失图像时默认采样造成的有效像素减少的问题; 最后改进损失函数, 注重直观性感受, 提高修复图像的直观视觉性。在不同场景下修复的实验结果表明, DMSWT模型能够学习到更多的结构先验信息, 并生成符合现实直觉的修复图像, 且在定量评估下指标有明显提高。

关键词: 文物图像修复, 深度学习, 自注意力机制, 卷积网络, 掩膜机制

Abstract:

When convolutional networks are used in the image inpainting of cultural relics, the convolution kernel's limited receptive field poses challenges, which results in a weak comprehension of the global context and complex structures. Moreover, the convolution operation does not adequately handle the intricate geometrical shapes of relic surfaces owing to its translation invariance; hence, convolution-based inpainting is prone to irrelevant structures and artifacts. In the case of Transformer models with self-attention mechanisms, which process the details and local features of relic images, the insufficient attention to specific regions makes it difficult to capture the deep features necessary for precise and detailed inpainting. Additionally, Transformers cannot adequately capture long-range semantics, which results in a suboptimal visual quality of the inpainted images. This paper proposes a relic image inpainting model based on the SwinTransformer, called the Dynamic Mask on SwinTransformer (DMSWT). The model introduces several improvements to the self-attention module within the network to optimize its structure. First, layer normalization is removed, and fully connected layers are replaced with residual connections to enhance the deep feature extraction capabilities of the network. Second, a dynamic mask mechanism is introduced to mitigate the issue of reduced effective pixels caused by default sampling in the inpainting of images with large-scale missing regions. Finally, the loss function is improved with a focus on enhancing the perceptual realism, leading to an improvement in the visual quality of the inpainted images. Experimental results for different scenarios show that the DMSWT model can learn more structural prior information and generate inpainted images that align with real-world intuition. Additionally, quantitative evaluations demonstrate significant improvements in performance metrics.

Key words: cultural relics image restoration, deep learning, self-attention mechanism, convolutional network, masking mechanism

胡康源, 郭涛, 穆楠. 基于自注意力机制和动态掩膜机制的文物图像修复方法[J]. 计算机工程, 2026, 52(6): 179-188.

HU Kangyuan, GUO Tao, MU Nan. Image Inpainting Method for Cultural Relics Based on Self-Attention Mechanism and Dynamic Masking Mechanism[J]. Computer Engineering, 2026, 52(6): 179-188.

https://www.ecice06.com/CN/Y2026/V52/I6/179

图/表 13

图1 SwinTransformer和ViT的对比

Fig.1 Comparison of SwinTransformer and ViT

图2 RMSWT网络整体架构

Fig.2 The overall architecture of the RMSWT network

图3 多头自注意力模块

Fig.3 Multiple self-attention modules

图4 掩膜更新的简单示例

Fig.4 Simple example of mask update

图5 不同算法对现实世界壁画破损的修复结果对比

Fig.5 Comparison of the results of different algorithms for inpainting broken real-world frescoes

图6 实际破损文物图像修复

Fig.6 Image inpainting of actual damaged cultural relics

图7 评价指标异常对比

Fig.7 Comparative chart of evaluation indicator anomalies

图8 不同方法的LPIPS值对比

Fig.8 Comparison of LPIPS values for different methods

图9 不同掩膜率下的PSNR测试集结果

Fig.9 PSNR test set results of different mask rates

图10 不同掩膜率下的SSIM测试集结果

Fig.10 SSIM test set results of different mask rates

参考文献 26

1	WANG H , LI Q Q , JIA S . A global and local feature weighted method for ancient murals inpainting. International Journal of Machine Learning and Cybernetics, 2020, 11 (6): 1197- 1216. doi: 10.1007/s13042-019-01032-2
2	CHEN H B, ZHAO L, WANG Z Z, et al. DualAST: dual style-learning networks for artistic style transfer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 872-881.
3	HALIASSOS A, BARMPOUTIS P, STATHAKI T, et al. Classification and detection of symbols in ancient papyri[C]//Proceedings of the 5th Conference on Visual Computing for Cultural Heritage. Berlin, Germany: Springer, 2020: 121-140.
4	OPGENHAFFEN L . The impact of digital technology on archaeological recording strategies and ensuing open research archives. Digital Applications in Archaeology and Cultural Heritage, 2022, 27, e00231. doi: 10.1016/j.daach.2022.e00231
5	张乐, 余映, 革浩. 基于快速傅里叶卷积与特征修剪坐标注意力的壁画修复. 计算机科学, 2024, 51 (S1): 338- 346.
	ZHANG L , YU Y , GE H . Mural restoration based on fast Fourier convolution and feature pruning coordinate attention. Computer Science, 2024, 51 (S1): 338- 346.
6	李奇, 李龙, 王卫, 等. 基于改进Criminisi算法的破损纺织品文物图像修复. 激光与光电子学进展, 2023, 60 (16): 173- 182.
	LI Q , LI L , WANG W , et al. Image restoration of damaged textile artifacts based on improved Criminisi algorithm. Advances in Laser and Optoelectronics, 2023, 60 (16): 173- 182.
7	苏挺超, 沈映珊. 基于分层贝叶斯模型的图像修复方法. 计算机应用与软件, 2023, 40 (10): 261- 267.
	SU T C , SHEN Y S . Image inpainting method based on hierarchical Bayesian model. Computer Applications and Software, 2023, 40 (10): 261- 267.
8	刘仲民, 严梁. 融合动态特征与注意力的敦煌壁画修复模型. 计算机工程, 2024, 50 (5): 342- 353. doi: 10.19678/j.issn.1000-3428.0067371
	LIU Z M , YAN L . Dunhuang mural restoration model integrating dynamic features and attention. Computer Engineering, 2024, 50 (5): 342- 353. doi: 10.19678/j.issn.1000-3428.0067371
9	ZENG Y , FU J , CHAO H , et al. Aggregated contextual transformations for high-resolution image inpainting. IEEE Transactions on Visualization and Computer Graphics, 2023, 29 (7): 3266- 3280. doi: 10.1109/TVCG.2022.3156949
10	QUAN W Z , ZHANG R S , ZHANG Z , et al. Image inpainting with local and global refinement. IEEE Transactions on Image Processing, 2022, 31, 2405- 2420. doi: 10.1109/TIP.2022.3152624
11	张子愿, 鲍淑梅, 张晓坤, 等. 基于深度学习的新疆壁画修复方法. 现代电子技术, 2023, 46 (19): 55- 60.
	ZHANG Z Y , BAO S M , ZHANG X K , et al. Deep learning based mural painting restoration method in Xinjiang. Modern Electronic Technology, 2023, 46 (19): 55- 60.
12	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010.
13	LIU Z, LIN Y T, CAO Y, et al. SwinTransformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 10012-10022.
14	HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 16000-16009.
15	LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2024-06-20]. https://arxiv.org/abs/1907.11692.
16	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. an image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2024-06-20]. https://arxiv.org/abs/2010.11929.
17	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建. 计算机工程, 2024, 50 (9): 33- 45. doi: 10.19678/j.issn.1000-3428.0068296
	WANG Z H , QIAN Y T . Dual-stream remote sensing image spatio-temporal fusion and super-resolution reconstruction based on Swin Transformer. Computer Engineering, 2024, 50 (9): 33- 45. doi: 10.19678/j.issn.1000-3428.0068296
18	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 27-36.
19	JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[EB/OL]. [2024-06-20]. https://arxiv.org/abs/1603.08155.
20	MESCHEDER L, GEIGER A, NOWOZIN S. Which training methods for gans do actually converge?[C]//Proceedings of the IEEE International Conference on Machine Learning. Washington D. C., USA: IEEE Press, 2018: 3481-3490.
21	ROSS A , DOSHI-VELEZ F . Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. Artificial Intelligence, 2018, 32 (1): 3567- 3576.
22	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-06-20]. https://arxiv.org/abs/1409.1556.
23	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 586-595.
24	NAZERI K, NG E, JOSEPH T, et al. EdgeConnect: structure guided image inpainting using edge prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop. Washington D. C., USA: IEEE Press, 2019: 336-345.
25	WAN Z Y, ZHANG B, CHEN D D, et al. Bringing old photos back to life[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 2744-2754.
26	WAN Z Y, ZHANG J B, CHEN D D, et al. High-fidelity pluralistic image completion with transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2744-2754, 2021: 4692-4701.

[1]	李亦然, 聂宏宾, 杨紫骞, 卞春江. 密集星场下空间暗弱群组目标检测方法[J]. 计算机工程, 2026, 52(6): 149-159.
[2]	肖泽秋, 李勇, 王霞. 基于PBI-CLA模型的糖尿病患者血糖浓度预测[J]. 计算机工程, 2026, 52(6): 382-390.
[3]	孙海峰, 姚俊萍, 李晓军, 刘延飞, 辜弘炀. 短期动作预测深度学习方法综述[J]. 计算机工程, 2026, 52(6): 31-52.
[4]	李学相, 郑永利, 张怡泽, 段鹏松. 基于机器学习与预训练模型的流量分析方法综述[J]. 计算机工程, 2026, 52(6): 53-67.
[5]	吴永庆, 姜正宇. 基于解耦动态时空卷积循环网络的交通流预测[J]. 计算机工程, 2026, 52(5): 160-171.
[6]	宋天泽, 曹从军, 何佳琪, 王旭升, 刘晨煜. 基于改进DETR的密集行人检测算法研究[J]. 计算机工程, 2026, 52(5): 250-258.
[7]	李辉, 刘佳煜, 徐雅萍. 面向医学图像分割的深度学习模型架构与性能评估方法综述[J]. 计算机工程, 2026, 52(5): 81-94.
[8]	许旻辰, 屈丹, 司念文, 彭思思, 陈雅淇. 社交媒体虚假信息检测技术研究综述[J]. 计算机工程, 2026, 52(5): 60-80.
[9]	成彬, 赵彬兵, 雷华, 何博. 基于双目视觉的钢筋绑扎节点定位方法[J]. 计算机工程, 2026, 52(4): 433-445.
[10]	李娇, 范浩东, 洪旭东, 许镇义, 樊旭, 黄俊. 基于标签视觉原型学习的多标签图像分类[J]. 计算机工程, 2026, 52(4): 229-238.
[11]	王雯, 杨奎武, 仝松松, 魏江宏, 薛岩, 周荣魁. 深度神经网络模型水印攻击研究[J]. 计算机工程, 2026, 52(4): 22-38.
[12]	励皓轩, 张志远, 刘芮, 许沛华, 田昕. 基于隐式神经表达图像超分辨率的气象降尺度[J]. 计算机工程, 2026, 52(4): 376-385.
[13]	崔少国, 许松, 王名洋, 周粤. 面向智能教育的深度学习知识追踪研究进展[J]. 计算机工程, 2026, 52(4): 39-61.
[14]	张志, 尹昱凯, 孙奕灵, 孟雯锦, 彭畅. 基于多模态特征融合的Android恶意软件检测模型研究[J]. 计算机工程, 2026, 52(3): 243-254.
[15]	刘啸宇, 廖志芳, 谈遂, 余志武. 基于堆叠GRU神经网络的桥梁动应变预测[J]. 计算机工程, 2026, 52(3): 441-450.

选择文件类型/文献管理软件名称

选择包含的内容