Image-Scene Transformation Based on Generative Adversarial Networks

doi:10.19678/j.issn.1000-3428.0066077

Abstract

Abstract: Due to the limitations of time, place, photographic equipment, and other factors, it is difficult to obtain images with the same content but different scenes in the real world.One feasible way is to use Generative Adversarial Networks(GAN) to convert the scenes in the images without a pair of data sets.However, the existing GAN-based image-scene transformation approaches mainly focus on single-category, one-way, and simple-structure scene transformation.To achieve effective scene transformation with rich categories and highly complex semantic structure, a GAN-based image-scene transformation model is proposed in this study to realize the transformation between different scenes such as sunny, rainy, and foggy days.The combination of GAN, attention module, and scene-segmentation module enables the proposed model to accurately recognize and transform Regions of Interest(ROI) while keeping other regions unchanged.To further improve the diversity of output, this paper proposes a new regularization loss that helps in suppressing potential noise.In addition, a noise-separation module is embedded in the discriminator to avoid modal collapse due to lack of noise constraints.The experimental results show that the proposed model achieves 7.25% and 19% higher Fréchet Inception Distance(FID) score and Kernel Inception Distance(KID) score, respectively, compared with the six contrast models(for example, CycleGAN, UNIT, MUNIT, and NICE-GAN).Furthermore, the proposed model can generate images with improved visual effects in different scenes.

Key words: image processing, image transformation, Generative Adversarial Networks(GAN), scene transformation, attention mechanism

摘要： 由于时间、地点、摄影设备等因素的限制，导致在真实世界中很难获得内容相同而场景不同的图像，一种可行方式是利用生成对抗网络（GAN）在没有成对数据集的情况下对图片中的场景进行转换，但是已有基于GAN的图像场景转换方法主要关注单个类别、单向、结构简单的场景。为了解决具有丰富类别和高度复杂语义结构的图像场景转换问题，提出一种基于GAN的图像场景转换模型，以实现晴天、雨天、雾天等不同场景之间的转换。将GAN、注意力模块和场景分割模块相结合，使模型正确识别并转换感兴趣区域同时保持其他区域不变。为了进一步提高输出的多样性，提出一种新型的正则化损失来抑制潜在噪声。此外，为了避免因缺乏噪声约束而出现的模态崩溃问题，在鉴别器中嵌入噪声分离模块。实验结果表明，相较CycleGAN、UNIT、MUNIT、NICE-GAN等6种对比模型，该模型所生成图像的FID得分和KID得分平均分别提高约7.25%和19%，其能够在不同场景下生成视觉效果更佳的图像。

关键词: 图像处理, 图像转换, 生成对抗网络, 场景转换, 注意力机制

CLC Number:

TP311

LUO Siqing, CHEN Hui. Image-Scene Transformation Based on Generative Adversarial Networks[J]. Computer Engineering, 2023, 49(4): 217-225.

罗嗣卿, 陈慧. 基于生成对抗网络的图像场景转换[J]. 计算机工程, 2023, 49(4): 217-225.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0066077

http://www.ecice06.com/EN/Y2023/V49/I4/217

Figures/Tables 11

References

[1] 丁泳钧, 黄山.一种用于图像去雾的改进生成对抗网络[J].计算机工程, 2022, 48(6):207-212. DING Y J, HUANG S.An improved generative adversarial network for image dehazing[J].Computer Engineering, 2022, 48(6):207-212.(in Chinese)
[2] 黄山, 贾俊.基于改进循环生成式对抗网络的图像去雾方法[J].计算机工程, 2022, 48(12):218-223, 231. HUANG S, JIA J.Image defogging method based on improved cycle-consistent adversarial network[J].Computer Engineering, 2022, 48(12):218-223, 231.(in Chinese)
[3] 卢贝, 盖杉.多尺度渐进式残差网络的图像去雨[J].中国图象图形学报, 2022, 27(5):1537-1553. LU B, GAI S.Single image rain removal based on multi scale progressive residual network[J].Journal of Image and Graphics, 2022, 27(5):1537-1553.(in Chinese)
[4] 肖进胜, 周景龙, 雷俊锋, 等.面向图像场景转换的改进型生成对抗网络[J].软件学报, 2021, 32(9):2755-2768. XIAO J S, ZHOU J L, LEI J F, et al.Improved generative adversarial network for image scene transformation[J].Journal of Software, 2021, 32(9):2755-2768.(in Chinese)
[5] YANG C, KIM T, WANG R Z, et al.Show, attend, and translate:unsupervised image translation with self-regularization and attention[J].IEEE Transactions on Image Processing, 2019, 28(10):4845-4856.
[6] 陈佛计, 朱枫, 吴清潇, 等.生成对抗网络及其在图像生成中的应用研究综述[J].计算机学报, 2021, 44(2):347-369. CHEN F J, ZHU F, WU Q X, et al.A survey about image generation with generative adversarial nets[J].Chinese Journal of Computers, 2021, 44(2):347-369.(in Chinese)
[7] RENSINK R A.The dynamic representation of scenes[J].Visual Cognition, 2000, 7(1/2/3):17-42.
[8] MNIH V, HESSS N, GTAVES A, et al.Recurrent models of visual attention[EB/OL].[2022-09-05].https://arxiv.org/pdf/1406.6247.pdf.
[9] BAHDANAU D, CHO K, BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2022-09-05].https://arxiv.org/abs/1409.0473.
[10] XU K, BA J L, KIROS R, et al.Show, attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning.New York, USA:ACM Press, 2015:2048-2057.
[11] YAO L, TORABI A, CHO K, et al.Describing videos by exploiting temporal structure[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2016:4507-4515.
[12] XU H J, SAENKO K.Ask, attend and answer:exploring question-guided spatial attention for visual question answering[EB/OL].[2022-09-05].https://arxiv.org/pdf/1511.05234v1.pdf.
[13] GREGOR K, DANIHELKA I, GRAVES A, et al.Draw:a recurrent neural network for image generation[C]//Proceedings of the 32nd International Conference on Machine Learning.New York, USA:ACM Press, 2015:1462-1471.
[14] 张学锋, 李金晶.基于双注意力残差循环单幅图像去雨集成网络[J].软件学报, 2021, 32(10):3283-3292. ZHANG X F, LI J J.Single image de-raining using a recurrent dual-attention-residual ensemble network[J].Journal of Software, 2021, 32(10):3283-3292.(in Chinese)
[15] HUANG X, BELONGIE S.Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:1510-1519.
[16] ZHAO H S, SHI J P, QI X J, et al.Pyramid scene parsing network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6230-6239.
[17] MEJJATI Y A, CHRISTIAN R, JAMES T, et al.Unsupervised attention-guided image-to-image translation[EB/OL].[2022-09-05].https://arxiv.org/pdf/1806.02311.pdf.
[18] ARJOVSKY M, CHINTALA S, BOTTOU L.Wasserstein GAN[C]//Proceedings of the 34th International Conference on Machine Learning.New York, USA:ACM Press, 2017:1-32.
[19] NEUHOLD G, OLLMANN T, BULÒ S R, et al.The Mapillary Vistas dataset for semantic understanding of street scenes[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:5000-5009.
[20] ZHU J Y, PARK T, ISOLA P, et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2242-2251.
[21] LIU M, BREUEL T, KAUTZ J.Unsupervised image-to-image translation networks[EB/OL].[2022-09-05].https://arxiv.org/pdf/1703.00848.pdf.
[22] HUANG X, LIU M Y, BELONGIE S, et al.Multimodal unsupervised image-to-image translation[EB/OL].[2022-09-05].https://arxiv.org/pdf/1804.04732.pdf.
[23] CHEN R F, HUANG W B, HUANG B H, et al.Reusing discriminators for encoding:towards unsupervised image-to-image translation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:8165-8174.
[24] KIM J, KIM M, KANG H, et al.U-GAT-IT:unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation[EB/OL].[2022-09-05].https://arxiv.org/abs/1907.10830.
[25] YANG S, JIANG L M, LIU Z W, et al.Unsupervised image-to-image translation with generative prior[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2022:18311-18320.
[26] BINKOWSKI M, SUTHERLAND D J.Demystifying MMD GANs[EB/OL].[2022-09-05].https://arxiv.org/pdf/1801.01401.pdf.
[27] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al.GANs trained by a two time-scale update rule converge to a local Nash equilibrium[EB/OL].[2022-09-05].https://arxiv.org/pdf/1706.08500.pdf.
[28] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.

Please choose a citation manager

Content to export