Text-to-Image Synthesis Method Based on Channel Attention Mechanism

doi:10.19678/j.issn.1000-3428.0062998

Abstract

Abstract: To solve the problems of lack of details and structural errors in images generated at the low-resolution stage of the text-to-Image generation process, content-aware up-sampling and channel attention convolution modules are proposed.Both modules are based on the Dynamic attention Mechanism Generation Adversarial Networks (DMGAN).In other words, a text-generated image method based on a channel attention mechanism is proposed, and in the feature map upsampling process of the low-resolution image generation stage, an upsampling method based on content perception is proposed.The recombined convolution kernel is calculated through the input feature map and together with the feature map, they are used for convolution operation to ensure upsampling.The semantic consistency of the latter feature map and text conditions increases the accuracy of the generated low-resolution images.The channel attention convolution module is used to learn the importance of each feature channel of the feature map, highlight important feature channels, suppress invalid information, and enrichen the details of the generated image.In addition, in the training process, condition enhancement and perceptual loss function are combined to assist training to enhance the robustness of the training process and improve the quality of the generated image.The experimental results on the CUB-200-2011 dataset show that the Inception Score reached 4.83 and the R-value reached 75.62.Compared with DMGAN, it has increased by 1.6% and 4.6%, respectively.The proposed method can improve the structure of the generated image while increasing the clarity of the details of the generated image and improving semantic consistency thereby obtaining an image closer to the real image.

Key words: Generation Adversarial Networks(GAN), text-to-image synthesis, channel attention mechanism, content-aware upsampling, aware loss

摘要： 针对文本生成图像任务中生成图像细节缺失及低分辨率阶段生成图像存在结构性错误的问题，基于动态注意力机制生成对抗网络（DMGAN），引入内容感知上采样模块和通道注意力卷积模块，提出一种新的文本生成图像方法ECAGAN。在低分辨率图像生成阶段的特征图上采样过程中采用基于内容感知的上采样方法，通过输入特征图计算得到重组卷积核，使用重组卷积核和特征图进行卷积操作，确保上采样后的特征图和文本条件的语义一致性，使生成的低分辨率图像更加准确，利用通道注意力卷积模块学习特征图各个特征通道的重要程度，突出重要的特征通道，抑制无效信息，使生成图像的细节更丰富。此外在训练过程中结合条件增强和感知损失函数辅助训练，增强训练过程的鲁棒性，提高生成图像质量。在CUB-200-2011数据集上的实验结果表明，ECAGAN模型初始分数达到了4.83，R值达到了75.62，与DMGAN方法相比，分别提高了1.6%和4.6%，并且可改善生成图像结构错乱的问题，同时能够生成清晰的图像细节，语义一致性更高，更加接近真实图像。

关键词: 生成对抗网络, 文本生成图像, 通道注意力机制, 内容感知上采样, 感知损失

CLC Number:

TP391

ZHANG Yunfan, YI Yaohua, TANG Ziwei, WANG Xinyu. Text-to-Image Synthesis Method Based on Channel Attention Mechanism[J]. Computer Engineering, 2022, 48(4): 206-212,222.

张云帆, 易尧华, 汤梓伟, 王新宇. 基于通道注意力机制的文本生成图像方法[J]. 计算机工程, 2022, 48(4): 206-212,222.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0062998

http://www.ecice06.com/EN/Y2022/V48/I4/206

Figures/Tables 8

References

[1] HUANG H, YU P S, WANG C H.An introduction to image synthesis with generative adversarial nets[EB/OL].[2021-09-10].https://arxiv.org/abs/1803.04469.
[2] YI X, WALIA E, BABYN P.Generative adversarial network in medical imaging:a review[J].Medical Image Analysis, 2019, 58:1361-8415.
[3] GOODFELLOW I J, POUGET-ABADIE J, MIEZA M, et al.Generative adversarial networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.Cambridge, USA:MIT Press, 2014:2672-2680.
[4] 柴梦婷, 朱远平.生成式对抗网络研究与应用进展[J].计算机工程, 2019, 45(9):222-234. CHAI M T, ZHU Y P.Research and application progress of generative adversarial networks[J].Computer Engineering, 2019, 45(9):222-234.(in Chinese)
[5] MIRZA M, OSINDERO S.Conditional generative adversarial nets[EB/OL].[2021-09-10].https://arxivpreprintarxiv:1411.1784.
[6] REED S, AKATA Z, YAN X, et al.Generative adversarial text to image synthesis[EB/OL].[2021-09-10].https://arxiv.org/pdf/1605.05396.pdf.
[7] ZHANG H, XU T, LI H S, et al.StackGAN:text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:5908-5916.
[8] KARRAS T, LAINE S, AILA T M.A style-based generator architecture for generative adversarial networks[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:4396-4405.
[9] XU T, ZHANG P C, HUANG Q Y, et al.AttnGAN:fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:1316-1324.
[10] REED S E, AKATA Z, MOHAN S, et al.Learning what andwhere to draw[C]//Proceedings of International Conference on NeuralInformation Processing Systems.New York, USA:ACM Press, 2016:217-225.
[11] LI B W, QI X J, LUKASIEWICZ T, et al.Controllable text-to-image generation[EB/OL].[2021-09-10].https://arxiv.org/abs/1909.07083.
[12] ZHU M F, PAN P B, CHEN W, et al.DM-GAN:dynamic memory generative adversarial networks for text-to-image synthesis[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:5795-5803.
[13] 张海涛, 张梦.引入通道注意力机制的SSD目标检测算法[J].计算机工程, 2020, 46(8):264-270. ZHANG H T, ZHANG M.SSD target detection algorithm with channel attention mechanism[J].Computer Engineering, 2020, 46(8):264-270.(in Chinese)
[14] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[15] WANG Q L, WU B G, ZHU P F, et al.ECA-net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:11531-11539.
[16] ZEILER M D, TAYLOR G W, FERGUS R.Adaptive deconvolutional networks for mid and high level feature learning[C]//Proceedings of 2011 International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2011:2018-2025.
[17] FU C Y, LIU W, RANGA A, et al.DSSD:deconvolutional single shot detector[EB/OL].[2021-09-10].https://arxiv.org/abs/1701.06659.
[18] WANG J Q, CHEN K, XU R, et al.CARAFE:content-aware reassembly of features[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:3007-3016.
[19] 许一宁, 何小海, 张津, 等.基于多层次分辨率递进生成对抗网络的文本生成图像方法[J].计算机应用, 2020, 40(12):3612-3617. XU Y N, HE X H, ZHANG J, et al.Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks[J].Journal of Computer Applications, 2020, 40(12):3612-3617.(in Chinese)
[20] QIAO T T, ZHANG J, XU D Q, et al.MirrorGAN:learning text-to-image generation by redescription[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:1505-1514.
[21] KULKARNI G, PREMRAJ V, ORDONEZ V, et al.BabyTalk:understanding and generating simple image descriptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12):2891-2903.
[22] SZEGEDY C, VANHOUCKE V, IOFFE S, et al.Rethinking the inception architecture for computer vision[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:2818-2826.
[23] WAH C, BRANSON S, WELINDER P, et al.The caltech-ucsd birds-200-2011 dataset:computation & neural systems technical report[D].Pasadena, USA:California Institute of Technology, 2011.
[24] SALIMANS T, GOODFELLOW I J, ZAREMBA W, et al.Improved techniques for training GANs[C]//Proceedings of the 29th International Conference on Neural Information Processing Systems.Cambridge, USA:MIT Press, 2016:2234-2242.
[25] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al.GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Proceedings of the 31st International Conferenceon Neural Information Processing Systems.Cambridge, USA:MIT Press, 2017:6629-6640.

Please choose a citation manager

Content to export