Person Image Synthesis Based on Posture Guidance and Attribute Decomposition

doi:10.19678/j.issn.1000-3428.0063376

Abstract

Abstract: Pose controllable person image synthesis involves generating a new image of the source person image under a transformed pose, and the coat, pants, hair style, and other character attributes must be consistent with the source person. As directly integrating the person texture and human posture key point coding is difficult, the consistency between some key character attributes in the generated and source images is poor.Therefore, this study establishes a dual stream generation network model under cyclic consistency constraint.In the training phase, the model adds the pose condition information of the source person to the input of the texture encoder, thereby reducing the search space of the decomposition component coding and improving the controllable granularity of the person generation.A fusion module is designed to fuse the pose information of the source person with the style coding of each decomposition component for generation and confrontation training.Simultaneously, circular consistency constraints are added to ensure that the generated image matches the hidden space better.In the test phase, the texture encoding information of the source person and pose encoding information of the target are separately encoded in the network, and the pose-transformed person image is obtained through information fusion and decoding.Qualitative and quantitative tests are conducted using the DeepFashion dataset.The results show that the Peak Signal-to-Noise Ratio(PSNR), perceptual score, and Structural Similarity(SSIM) of the model reach 31.409 dB, 3.369, and 0.768, respectively.The pose guidance conditions and circular consistency constraints added to the model can simplify the probability generation and expression of the attribute decomposition, making the texture of the image generated using the characters more accurate and consistent with human visual perception characteristics.

Key words: person image synthesis, pose transformation, Generative Adversarial Network(GAN), human key point estimation, human semantic segmentation, cycle consistency

摘要： 生成姿势受控的人物图像要求在变换姿势条件下生成与源人物图像对应的新图像，同时新图像中人物的上衣、裤子、发型等属性需要与源人物保持一致。由于人物纹理编码和人体姿势关键点编码难以直接融合，导致生成图像中一些关键人物属性与源图像的一致性较差，为此，建立一种循环一致性约束下的双流生成网络模型。在训练阶段，该模型在纹理编码器的输入中增加源人物的姿势条件信息，从而缩小分解组件编码的搜索空间，提高人物生成的可控粒度。设计一个融合模块将源人物的姿势信息与每一个分解组件样式码相融合以进行生成和对抗训练，同时，增加循环一致性约束，使得生成图像与隐空间更为匹配。在测试阶段，通过网络对源人物的纹理编码信息与目标的姿势编码信息分别进行编码，经过信息融合和解码获得姿势变换后的人物图像。使用DeepFashion数据集进行定性和定量测试，结果表明，该模型的峰值信噪比、感知评分、结构相似性指标分别达到31.409 dB、3.369、0.768，模型中添加的姿势引导条件和循环一致性约束能够简化属性分解的概率生成表达，使得人物生成图像的纹理更为准确，符合人类视觉感知特性。

关键词: 人物图像生成, 姿势变换, 生成对抗网络, 人体关键点估计, 人体语义分割, 循环一致性

CLC Number:

TP391.4

YIN Xin, ZHANG Zhancheng. Person Image Synthesis Based on Posture Guidance and Attribute Decomposition[J]. Computer Engineering, 2022, 48(11): 224-230,239.

殷歆, 张战成. 基于姿势引导与属性分解的人物图像生成[J]. 计算机工程, 2022, 48(11): 224-230,239.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0063376

http://www.ecice06.com/EN/Y2022/V48/I11/224

Figures/Tables 9

References

[1] 朱海琦, 李宏, 李定文.基于单幅图像学习的生成对抗网络模型[J].计算机工程, 2021, 47(8):271-276, 283. ZHU H Q, LI H, LI D W.Generative adversarial network model based on single image learning[J].Computer Engineering, 2021, 47(8):271-276, 283.(in Chinese)
[2] MEN Y F, MAO Y M, JIANG Y N, et al.Controllable person image synthesis with attribute-decomposed GAN[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:5083-5092.
[3] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al.Generative adversarial networks[EB/OL].[2021-10-05].https://arxiv.org/abs/1406.2661.
[4] MIRZA M, OSINDERO S.Conditional generative adversarial nets[EB/OL].[2021-10-05].https://www.semanticscholar.org/reader/353ecf7b66b3e9ff5e9f41145a147e899a2eea5c.
[5] LIU L L, ZHANG H J, XU X F, et al.Collocating clothes with generative adversarial networks cosupervised by categories and attributes:a multidiscriminator framework[J].IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(9):3540-3554.
[6] ISOLA P, ZHU J Y, ZHOU T H, et al.Image-to-image translation with conditional adversarial networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:5967-5976.
[7] RONNEBERGER O, FISCHER P, BROX T.U-net:convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention.Berlin, Germany:Springer, 2015:234-241.
[8] WANG T C, LIU M Y, ZHU J Y, et al.High-resolution image synthesis and semantic manipulation with conditional GANs[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8798-8807.
[9] ZHU J Y, PARK T, ISOLA P, et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2242-2251.
[10] WEI Y Y, ZHANG Z, WANG Y, et al.DerainCycleGAN:rain attentive CycleGAN for single image deraining and rainmaking[J].IEEE Transactions on Image Processing, 2021, 30:4788-4801.
[11] GAO R, HOU X S, QIN J, et al.Zero-VAE-GAN:generating unseen features for generalized and transductive zero-shot learning[J].IEEE Transactions on Image Processing, 2020, 29:3665-3680.
[12] CHAN C, GINOSAR S, ZHOU T H, et al.Everybody dance now[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:5932-5941.
[13] WANG T C, LIU M Y, TAO A, et al.Few-shot video-to-video synthesis[EB/OL].[2021-10-05].https://arxiv.org/pdf/1910.12713.pdf.
[14] MA L Q, JIA X, SUN Q R, et al.Pose guided person image generation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Washington D.C., USA:IEEE Press, 2017:405-415.
[15] KARRAS T, LAINE S, AILA T M.A style-based generator architecture for generative adversarial networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:4396-4405.
[16] HUANG X, BELONGIE S.Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:1510-1519.
[17] ZHU Z, HUANG T T, SHI B G, et al.Progressive pose attention transfer for person image generation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:2342-2351.
[18] ZHANG J S, LI K, LAI Y K, et al.PISE:person image synthesis and editing with decoupled GAN[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:7978-7986.
[19] LIANG X D, GONG K, SHEN X H, et al.Look into person:joint body parsing & pose estimation network and a new benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4):871-885.
[20] LIN T Y, MAIRE M, BELONGIE S, et al.Microsoft COCO:common objects in context[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2014:740-755.
[21] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2021-10-05].https://arxiv.org/pdf/1409.1556.pdf.
[22] LU Y, TAI Y W, TANG C K.Attribute-guided face generation using conditional CycleGAN[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:282-297.
[23] CAO Z, HIDALGO G, SIMON T, et al.OpenPose:realtime multi-person 2D pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1):172-186.
[24] LEDIG C, THEIS L, HUSZÁR F, et al.Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:105-114.
[25] JOHNSON J, ALAHI A, FEI-FEI L.Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:694-711.
[26] SIAROHIN A, SANGINETO E, LATHUILIÈRE S, et al.Deformable GANs for pose-based human image generation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:3408-3416.
[27] LIU Z W, LUO P, QIU S, et al.DeepFashion:powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:1096-1104.

Please choose a citation manager

Content to export