基于字形约束和注意力的艺术字体风格迁移

doi:10.19678/j.issn.1000-3428.0068380

摘要/Abstract

摘要：

艺术字体的风格迁移是一项非常有趣但又十分具有挑战性的任务, 具体来说就是将目标字体的艺术风格通过某种映射方式迁移到源字体上。现有方法在字形风格迁移方面存在鲁棒性有限的不足, 且当2种不同风格的字形相差较大时不能很好地将风格内容迁移到目标字体上。针对以上问题, 提出一种端到端的通用网络框架模型, 并在模型中引入自注意力机制和自适应实例归一化, 用于实现在给定的多个文本效果域之间进行任意字体的艺术风格迁移。该模型主要包括1个生成器和2个鉴别器, 还有1个额外的风格编码器。为了更好地做到字形约束以及提升网络的性能, 设计几种损失函数来优化生成对抗网络(GAN)的训练。为了验证该模型的有效性, 采用了FET-GAN任务中公开的艺术字体数据集。实验对比了6种先进的方法, 并从定量和定性2个方面进行了比较。实验结果表明, 所提模型能够实现带有字体变换的字形图像风格迁移, 迁移结果能够保持很好的字形结构, 并且FID值为72.355, 低于对比实验中最好的结果91.435。

关键词: 字体风格迁移, 自注意力, 自适应实例归一化, 生成对抗网络, 字形约束

Abstract:

Art font style transfer is an intriguing yet challenging task that involves transferring the art style of a source font to a target font through mapping. This study aims to address the limitations of existing methods, particularly their limited robustness in font style migration and poor performance when there is a significant difference between the styles of the source and target fonts. To tackle these challenges, we proposed an end-to-end general network framework model incorporating a self-attention mechanism and adaptive instance normalization to realize artistic style transfer across multiple-text effect domains. The proposed model comprised a generator, two discriminators, and an additional style encoder. To better preserve the font structure and improve network performance, we designed several custom loss functions to optimize the training of a Generative Adversarial Network(GAN). The model was validated using a publicly available art font dataset for the FET-GAN task. In experiments comparing six state-of-the-art methods, the proposed method demonstrated superior performance both quantitatively and qualitatively. Extensive experimental results showed that the model effectively performs font image style migration while maintaining the glyph structure. The Fréchet Inception Distance(FID) of the proposed method is 72.355, which was notably lower than the best comparative result of 91.435, highlighting the method's effectiveness.

Key words: font style transfer, self-attention, adaptive instance normalization, Generative Adversarial Network(GAN), the glyph constraints

吕文锐, 普园媛, 赵征鹏, 张衡, 阳秋霞. 基于字形约束和注意力的艺术字体风格迁移[J]. 计算机工程, 2024, 50(12): 306-317.

LÜ Wenrui, PU Yuanyuan, ZHAO Zhengpeng, ZHANG Heng, YANG Qiuxia. Artistic Font Style Transfer Based on Glyph Constraints and Attention[J]. Computer Engineering, 2024, 50(12): 306-317.

https://www.ecice06.com/CN/Y2024/V50/I12/306

图/表 16

图1 艺术字体风格迁移系统框架示意图

Fig.1 Schematic diagram of the framework of artistic font style transfer system

图2 本文方法效果图

Fig.2 Effect diagram of the proposed method

图3 模型整体框架细节示意图

Fig.3 Schematic diagram of the overall framework of the model

图4 多分类判别器D_{M_C}的框架图

Fig.4 Frame diagram of the multi-class discriminatorD_{M_C}

图5 局部判别器D_local的框架图

Fig.5 Frame diagram of the multi-class discriminator D_local

图6 本文方法与其他6种方法的实验结果比较

Fig.6 Comparison of experimental results of the proposed method and other six methods

图7 多种风格域之间的相互转换

Fig.7 Interconversion between various style domains

图8 本文模型在英文和阿拉伯数字字符上的迁移效果

Fig.8 The transfer effect of the proposed model on English and Arabic numerals

图9 本文模型消融实验结果

Fig.9 Ablation experiment results of the proposed model

图10 超参数λ_CX分析的结果比较

Fig.10 Results comparison of hyperparameter λ_CX analysis

图11 超参数λ_HED分析的结果比较

Fig.11 Results comparison of hyperparameter λ_HEDanalysis

图12 本文模型推广到自动补充字体库的示例

Fig.12 Example of extending the proposed model to auto-complement font libraries

参考文献 33

1	GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 2414-2423.
2	GATYS L A, ECKER A S, BETHGE M. Texture synthesis using convolutional neural networks[EB/OL]. [2023-05-20]. https://arxiv.org/abs/1505.07376.
3	LI Y, WANG N, LIU J, et al. Demystifying neural style transfer[EB/OL]. [2023-05-20]. https://arxiv.org/abs/701.01036.
4	LI Y, FANG C, YANG J, et al. Universal style transfer via feature transforms[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 385-395. URL
5	CAMPBELL N D F , KAUTZ J . Learning a manifold of fonts. ACM Transactions on Graphics, 2014, 33 (4): 1- 11.
6	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2014: 2672-2680.
7	YANG S , LIU J Y , WANG W J , et al. TET-GAN: text effects transfer via stylization and destylization. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33 (1): 1238- 1245. doi: 10.1609/aaai.v33i01.33011238
8	LI C, WAND M. Precomputed real-time texture synthesis with Markovian generative adversarial networks[C]//Proceedings of ECCV 2016. Berlin, Germany: Springer, 2016: 702-716. URL
9	HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 1501-1510. URL
10	LIU M Y, HUANG X, MALLYA A, et al. Few-shot unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 10551-10560. URL
11	KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 4401-4410. URL
12	WANG Z, ZHAO L, CHEN H, et al. Diversified patch-based style transfer with shifted style normalization[EB/OL]. [2023-05-20]. https://arxiv.org/abs/2101.06381v1.
13	KALISCHEK N, WEGNER J D, SCHINDLER K. In the light of feature distributions: moment matching for neural style transfer[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 9382-9391. URL
14	CANHAM T D, MARTÍN A, BERTALMÍO M, et al. Using decoupled features for photo-realistic style transfer[EB/OL]. [2023-05-20]. https://arxiv.org/abs/2212.02953.
15	HONG K, JEON S, LEE J, et al. AesPA-net: aesthetic pattern-aware style transfer networks[EB/OL]. [2023-05-20]. https://arxiv.org/abs/2307.09724.
16	让孝迪. 基于生成对抗网络的无监督艺术图像风格迁移[D]. 烟台: 烟台大学, 2023. URL
	RANG X D. Unsupervised art image style transfer based on generative confrontation network[D]. Yantai: Yantai University, 2023. (in Chinese)
17	过劲. 基于生成对抗网络的艺术风格图像迁移研究[D]. 南昌: 南昌大学, 2023. URL
	GUO J. Research on image migration of artistic style based on generative confrontation network[D]. Nanchang: Nanchang University, 2023. (in Chinese)
18	TOGO R , KOTERA M , OGAWA T , et al. Text-guided style transfer-based image manipulation using multimodal generative models. IEEE Access, 2021, 9, 64860- 64870. doi: 10.1109/ACCESS.2021.3069876
19	CHEN H B, ZHAO L, ZHANG H M, et al. Diverse image style transfer via invertible cross-space mapping[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 14860-14869. URL
20	CAO J . Hierarchical-based calligraphy style transfer. World Scientific Research Journal, 2021, 7 (5): 430- 439. doi: 10.6911/WSRJ.202105_7(5).0048
21	LI W , HE Y X , QI Y W , et al. FET-GAN: font and effect transfer via K-shot adaptive instance normalization. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (2): 1717- 1724. doi: 10.1609/aaai.v34i02.5535
22	ZHANG H, GOODFELLOW I, METAXAS D, et al. Self-attention generative adversarial networks[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: PMLR, 2019: 7354-7363. URL
23	ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 1125-1134.
24	KULKARNI T D, WHITNEY W, KOHLI P, et al. Deep convolutional inverse graphics network[EB/OL]. [2023-05-20]. https://arxiv.org/abs/1503.03167.
25	MECHREZ R, TALMI I, ZELNIK-MANOR L. The contextual loss for image transformation with non-aligned data[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 768-783. URL
26	MIYATO T, KATAOKA T, KOYAMA M, et al. Spectral normalization for generative adversarial networks[EB/OL]. [2023-05-20]. https://arxiv.org/abs/1802.05957.
27	MESCHEDER L, GEIGER A, NOWOZIN S. Which training methods for GANs do actually converge[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: PMLR, 2018: 3481-3490. URL
28	ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 2223-2232. URL
29	CHOI Y, UH Y, YOO J, et al. StarGAN v2: diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 8188-8197. URL
30	KIM J, KIM M, KANG H, et al. U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation[EB/OL]. [2023-05-20]. https://arxiv.org/abs/1907.10830. URL
31	HORE A, ZIOU D. Image quality metrics: PSNR vs. SSIM[C]//Proceedings of the 20th International Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2010: 2366-2369. URL
32	PREUER K , RENZ P , UNTERTHINER T , et al. Fréchet ChemNet Distance: a metric for generative models for molecules in drug discovery. Journal of Chemical Information and Modeling, 2018, 58 (9): 1736- 1741. doi: 10.1021/acs.jcim.8b00234
33	钱旭淼, 段锦, 刘举, 等. 基于注意力特征融合的图像去雾算法. 吉林大学学报(理学版), 2023, 61 (3): 567- 576. doi: 10.13413/j.cnki.jdxblxb.2022252
	QIAN X M , DUAN J , LIU J , et al. Image dehazing algorithm based on attention feature fusion. Journal of Jilin University(Science Edition), 2023, 61 (3): 567- 576. doi: 10.13413/j.cnki.jdxblxb.2022252

[1]	杨明强, 卢健. 基于跨模态注意力的目标语音提取[J]. 计算机工程, 2024, 50(9): 121-129.
[2]	蔡俊民, 梁正友, 孙宇, 陈子奥. 基于可变形三维图卷积的轻量级点云分类研究[J]. 计算机工程, 2024, 50(9): 255-265.
[3]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[4]	陈明, 牛燕菲, 段莉, 高铁梁, 楚杨阳, 曹洁. 基于残差自注意力和分离集合匹配的高效端到端航天器组件检测[J]. 计算机工程, 2024, 50(8): 301-309.
[5]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.
[6]	胡庆. 多尺度融合与双输出U-Net网络的行人重识别[J]. 计算机工程, 2024, 50(6): 102-109.
[7]	张慧妍, 梁勇, 兰景宏, 赵强. 基于记忆模块与过滤式生成对抗网络的入侵检测方法[J]. 计算机工程, 2024, 50(6): 197-207.
[8]	贺姗, 蔺素珍, 王彦博, 李大威. 基于特征融合的多波段图像描述生成方法[J]. 计算机工程, 2024, 50(6): 236-244.
[9]	李田芳, 普园媛, 赵征鹏, 徐丹, 钱文华. 基于CLIP和双空间自适应归一化的图像翻译[J]. 计算机工程, 2024, 50(5): 229-240.
[10]	谢帅康, 熊风光, 朱新杰, 宋宁栋, 李文清, 王廷凤. 基于空间可变形Transformer的三维点云配准方法[J]. 计算机工程, 2024, 50(3): 224-232.
[11]	刘帅威, 李智, 王国美, 张丽. 基于Transformer和GAN的对抗样本生成算法[J]. 计算机工程, 2024, 50(2): 180-187.
[12]	徐浩宸, 刘满华. 基于多层次自注意力网络的人脸特征点检测[J]. 计算机工程, 2024, 50(2): 239-246.
[13]	何银银, 胡静, 陈志泊, 张荣国. 融合门控变换机制和GAN的低光照图像增强方法[J]. 计算机工程, 2024, 50(2): 247-255.
[14]	王正家, 胡飞飞, 张成娟, 雷卓, 何涛. 引入轻量级Transformer的自适应窗口立体匹配算法[J]. 计算机工程, 2024, 50(2): 256-265.
[15]	张美美, 秦品乐, 柴锐, 曾建潮, 翟双姣, 闫俊义, 冯二燕. 面向急性缺血性脑卒中的CT生成MRI算法[J]. 计算机工程, 2024, 50(2): 317-326.

选择文件类型/文献管理软件名称

选择包含的内容