作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 206-212,222. doi: 10.19678/j.issn.1000-3428.0062998

• 图形图像处理 • 上一篇    下一篇

基于通道注意力机制的文本生成图像方法

张云帆, 易尧华, 汤梓伟, 王新宇   

  1. 武汉大学 印刷与包装系, 武汉 430079
  • 收稿日期:2021-10-19 修回日期:2021-12-18 发布日期:2022-04-14
  • 作者简介:张云帆(1997—),男,硕士研究生,主研方向为图像生成;易尧华,教授、博士;汤梓伟,博士研究生;王新宇,硕士研究生。
  • 基金资助:
    国家重点研发计划(2021YFB2206200)。

Text-to-Image Synthesis Method Based on Channel Attention Mechanism

ZHANG Yunfan, YI Yaohua, TANG Ziwei, WANG Xinyu   

  1. Department of Printing and Packaging, Wuhan University, Wuhan 430079, China
  • Received:2021-10-19 Revised:2021-12-18 Published:2022-04-14

摘要: 针对文本生成图像任务中生成图像细节缺失及低分辨率阶段生成图像存在结构性错误的问题,基于动态注意力机制生成对抗网络(DMGAN),引入内容感知上采样模块和通道注意力卷积模块,提出一种新的文本生成图像方法ECAGAN。在低分辨率图像生成阶段的特征图上采样过程中采用基于内容感知的上采样方法,通过输入特征图计算得到重组卷积核,使用重组卷积核和特征图进行卷积操作,确保上采样后的特征图和文本条件的语义一致性,使生成的低分辨率图像更加准确,利用通道注意力卷积模块学习特征图各个特征通道的重要程度,突出重要的特征通道,抑制无效信息,使生成图像的细节更丰富。此外在训练过程中结合条件增强和感知损失函数辅助训练,增强训练过程的鲁棒性,提高生成图像质量。在CUB-200-2011数据集上的实验结果表明,ECAGAN模型初始分数达到了4.83,R值达到了75.62,与DMGAN方法相比,分别提高了1.6%和4.6%,并且可改善生成图像结构错乱的问题,同时能够生成清晰的图像细节,语义一致性更高,更加接近真实图像。

关键词: 生成对抗网络, 文本生成图像, 通道注意力机制, 内容感知上采样, 感知损失

Abstract: To solve the problems of lack of details and structural errors in images generated at the low-resolution stage of the text-to-Image generation process, content-aware up-sampling and channel attention convolution modules are proposed.Both modules are based on the Dynamic attention Mechanism Generation Adversarial Networks (DMGAN).In other words, a text-generated image method based on a channel attention mechanism is proposed, and in the feature map upsampling process of the low-resolution image generation stage, an upsampling method based on content perception is proposed.The recombined convolution kernel is calculated through the input feature map and together with the feature map, they are used for convolution operation to ensure upsampling.The semantic consistency of the latter feature map and text conditions increases the accuracy of the generated low-resolution images.The channel attention convolution module is used to learn the importance of each feature channel of the feature map, highlight important feature channels, suppress invalid information, and enrichen the details of the generated image.In addition, in the training process, condition enhancement and perceptual loss function are combined to assist training to enhance the robustness of the training process and improve the quality of the generated image.The experimental results on the CUB-200-2011 dataset show that the Inception Score reached 4.83 and the R-value reached 75.62.Compared with DMGAN, it has increased by 1.6% and 4.6%, respectively.The proposed method can improve the structure of the generated image while increasing the clarity of the details of the generated image and improving semantic consistency thereby obtaining an image closer to the real image.

Key words: Generation Adversarial Networks(GAN), text-to-image synthesis, channel attention mechanism, content-aware upsampling, aware loss

中图分类号: