Image Inpainting Method for Cultural Relics Based on Self-attention Mechanism and Dynamic Masking Mechanism

doi:10.19678/j.issn.1000-3428.0070181

Abstract

Abstract: The convolutional network, when applied to cultural relic inpainting, faces challenges due to the convolution kernel's limited receptive field, resulting in weak comprehension of global context and complex structures. Moreover, the translation invariance of the convolution operation does not adequately handle the intricate geometrical shapes of relic surfaces, making convolution-based inpainting prone to irrelevant structures and artifacts. Transformer models with self-attention mechanisms, while processing the details and local features of relic images, often suffer from insufficient attention to specific regions, making it difficult to capture deep features necessary for precise and detailed inpainting. Additionally, Transformers struggle to capture long-range semantics, resulting in suboptimal visual quality in the inpainted images. This paper proposes a relic image inpainting model based on the Swin Transformer, called Dynamic Mask on Swin Transformer (DMSWT). The model introduces several improvements to the self-attention module within the network to optimize its structure. First, layer normalization is removed, and fully connected layers are replaced with residual connections to enhance the network's deep feature extraction capabilities. Second, a dynamic mask mechanism is introduced to mitigate the issue of reduced effective pixels caused by default sampling when inpainting images with large-scale missing regions. Finally, the loss function is improved with a focus on enhancing perceptual realism, thereby improving the visual quality of the inpainted images. Experimental results in different scenarios show that the DMSWT model can learn more structural prior information and generate inpainted images that align with real-world intuition. Additionally, quantitative evaluations demonstrate significant improvements in performance metrics.

摘要： 卷积网络在文物修复中由于卷积核的局部感受野对于全局上下文和复杂结构的理解较弱，又因卷积操作的平移不变性对文物表面复杂的几何形态处理不充分，在进行文图图像修复时容易出现无关结构和伪影等问题。具有自注意力机制的Transformer模型在处理文物图像的细节和局部特征时，对特定区域的细节关注不足，难以获取足够的深层特征，从而影响修复的精度和细腻度，对图像的远距离语义获取不充分，导致修复图像的直观视觉性不足。本文提出了一种基于SwinTransformer的文物图像修复模型(Dynamic mask on SwinTransformer，DMSWT)。该模型通过对网络中的自注意力模块进行多项改进，以优化网络结构。首先是删除层归一化，且用残差连接替换全连接层，提高网络的深层特征提取能力;其次是引入动态掩膜机制，缓解修复大规模缺失图像时默认采样造成的有效像素减少的问题；最后改进损失函数，注重直观性感受，提高修复图像的直观视觉性。通过在不同场景下修复的实验结果表明，DMSWT模型能够学习到更多的结构先验信息并生成符合现实直觉的修复图像，且在定量评估下指标有明显提高。

HU Kangyuan, Guo Tao, MU Nan. Image Inpainting Method for Cultural Relics Based on Self-attention Mechanism and Dynamic Masking Mechanism[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0070181.

胡康源, 郭涛, 穆楠. 基于自注意力机制和动态掩膜机制的文物图像修复方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0070181.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070181

References

[1] Wang H ,Li Q ,Jia S .A global and local feature weighted method for ancient murals inpainting[J].International Journal of Machine Learning and Cybernetics,2019,11(6):1-20.
[2] Chen Hao, Zhao Lu, Wang Zhenyu, et al. DualAST: Dual style-learning networks for artistic style transfer[C]//Proc of the 34th IEEE/CVF Int Conf on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE , 2021: 872-881
[3] Haliassos A, Barmpoutis P, Stathaki T, et al. Classification and detection of symbols in ancient papyri[C]//Proc of the 5th Visual Computing for Cultural Heritage. Berlin:Springer, 2020 : 121-140
[4] Loes O .Archives in action. The impact of digital technology on archaeological recording strategies and ensuing open research archives[J].Digital Applications in Archaeology and Cultural Heritage,2022,27.
[5] 张乐,余映,革浩.基于快速傅里叶卷积与特征修剪坐标注意力的壁画修复[J].计算机科学,2024,51(S1):338-346. (ZHANG Le,YU Ying,GE Hao. Mural restoration based on fast Fourier convolution and feature pruning coordinate attention[J]. Computer Science,2024,51(S1):338-346.)
[6] 李奇,李龙,王卫,等.基于改进 Criminisi 算法的破损纺织品文物图像修复 [J]. 激光与光电子学进展,2023,60(16):173-182. (Li Q, Li L, Wang W, et al. Image restoration of damaged textile artifacts based on improved Criminisi algorithm[J]. Advances in Laser and Optoelectronics,2023,60(16):173-182.)
[7] 苏挺超,沈映珊.基于分层贝叶斯模型的图像修复方法 [J].计算机应用与软件,2023,40(10):261-267. (Su Tingchao, Shen Yingshan. Image inpainting method based on hierarchical Bayesian model[J].Computer Applications and Software,2023,40(10):261-267.)
[8] 刘仲民,严梁.融合动态特征与注意力的敦煌壁画修复模型 [J]. 计算机工程,2024,50(05):342-353.DOI:10.19678/j.issn.1000-3428.0 067371. (Liu Zhongmin, Yan Liang. Dunhuang Mural Restoration Model Integrating Dynamic Features and Attention [J]. Computer Engineering, 2024, 50(05): 342-353. DOI: 10.19678/j.issn.1000-3428.0067371.)
[9] Yanhong Z ,Jianlong F ,Hongyang C , et al.Aggregated Contextual Transformations for High-Resolution Image Inpainting.[J].IEEE transactions on visualization and computer graphics,2023,29(7):3266-3280.
[10] Weize Q ,Ruisong Z ,Yong Z , et al.Image Inpainting with Local and Global Refinement.[J].IEEE transactions on image processing : a publication of the IEEE Signal Processing Society,2022,PP
[11] 张子愿,鲍淑梅,张晓坤,等.基于深度学习的新疆壁画修复方法 [J]. 现代电子技术,2023,46(19):55-60.DOI:10.16652/j.issn.1004-373x.202 3.19.011. (ZHANG Ziyuan,BAO Shumei,ZHANG Xiaokun,et al. Deep learning based mural painting restoration metho d in Xinjiang[J].Modern Electronic Technology,2023,4 6(19):55-60.DOI:10.16652/j.issn.1004-373x.2023.19.01 1.)
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C] //Proc of the 31st Int Conf on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2017: 6000–6010.
[13] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computervision. 2021: 10012-10022.
[14] He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 16000-16009.
[15] Liu Y ,Ott M ,Goyal N , et al.RoBERTa: A Robustly Optimized BERT Pretraining Approach.[J].CoRR,2019,abs/1907.11692.
[16] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[17] 王志浩,钱沄涛.基于 Swin Transformer 的双流遥感图像时空融合超分辨率重建 [J/OL]. 计算机工程 ,1-14[2024-09-09].https://doi.org/10.19678/j.issn.1000- 3428.0068296. (Wang Zhihao, Qian Yuntao. Dual-Stream Remote Sensing Image Spatio-Temporal Fusion and Super-Resolution Reconstruction Based on Swin Transformer [J/OL]. Computer Engineering, 1-14 [2024-09-09].
https://doi.org/10.19678/j.issn.1000-3428.0068296.) [18] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[J]. Advances in neural information processing systems, 2014, 27.
[19] Johnson J ,Alahi A ,Li F .Perceptual Losses for Real-Time Style Transfer and Super-Resolution.[J].CoRR,2016,abs/1603.08155.
[20] Mescheder L, Geiger A, Nowozin S. Which training methods for gans do actually converge?[C]//International conference on machine learning. PMLR, 2018: 3481-3490.
[21] Ross A, Doshi-Velez F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).
[22] Simonyan K ,Zisserman A .Very Deep Convolutional Networks for Large-Scale Image Recognition.[J].CoRR,2014,abs/1409.1556
[23] Zhang R, Isola P, Efros A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 586-595.
[24] Nazeri K, Ng E, Joseph T, et al. Edgeconnect: Structure guided image inpainting using edge prediction[C]//Proceedings of the IEEE/CVF international conference on computer vision workshops. 2019: 0-0.
[25] Z. W ,B. Z ,D. C , et al.Bringing old photos back to life[J].Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2020,2744-2754.
[26] Wan Z, Zhang J, Chen D, et al. High-fidelity pluralistic image completion with transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 4692-4701.

Please choose a citation manager

Content to export