Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (11): 224-230,239. doi: 10.19678/j.issn.1000-3428.0063376

• Graphics and Image Processing • Previous Articles     Next Articles

Person Image Synthesis Based on Posture Guidance and Attribute Decomposition

YIN Xin, ZHANG Zhancheng   

  1. School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
  • Received:2021-11-28 Revised:2022-01-19 Published:2022-11-05

基于姿势引导与属性分解的人物图像生成

殷歆, 张战成   

  1. 苏州科技大学 电子与信息工程学院, 江苏 苏州 215009
  • 作者简介:殷歆(1995—),男,硕士,主研方向为计算机视觉、生成对抗网络;张战成,副教授、博士。
  • 基金资助:
    国家自然科学基金(61772237)。

Abstract: Pose controllable person image synthesis involves generating a new image of the source person image under a transformed pose, and the coat, pants, hair style, and other character attributes must be consistent with the source person. As directly integrating the person texture and human posture key point coding is difficult, the consistency between some key character attributes in the generated and source images is poor.Therefore, this study establishes a dual stream generation network model under cyclic consistency constraint.In the training phase, the model adds the pose condition information of the source person to the input of the texture encoder, thereby reducing the search space of the decomposition component coding and improving the controllable granularity of the person generation.A fusion module is designed to fuse the pose information of the source person with the style coding of each decomposition component for generation and confrontation training.Simultaneously, circular consistency constraints are added to ensure that the generated image matches the hidden space better.In the test phase, the texture encoding information of the source person and pose encoding information of the target are separately encoded in the network, and the pose-transformed person image is obtained through information fusion and decoding.Qualitative and quantitative tests are conducted using the DeepFashion dataset.The results show that the Peak Signal-to-Noise Ratio(PSNR), perceptual score, and Structural Similarity(SSIM) of the model reach 31.409 dB, 3.369, and 0.768, respectively.The pose guidance conditions and circular consistency constraints added to the model can simplify the probability generation and expression of the attribute decomposition, making the texture of the image generated using the characters more accurate and consistent with human visual perception characteristics.

Key words: person image synthesis, pose transformation, Generative Adversarial Network(GAN), human key point estimation, human semantic segmentation, cycle consistency

摘要: 生成姿势受控的人物图像要求在变换姿势条件下生成与源人物图像对应的新图像,同时新图像中人物的上衣、裤子、发型等属性需要与源人物保持一致。由于人物纹理编码和人体姿势关键点编码难以直接融合,导致生成图像中一些关键人物属性与源图像的一致性较差,为此,建立一种循环一致性约束下的双流生成网络模型。在训练阶段,该模型在纹理编码器的输入中增加源人物的姿势条件信息,从而缩小分解组件编码的搜索空间,提高人物生成的可控粒度。设计一个融合模块将源人物的姿势信息与每一个分解组件样式码相融合以进行生成和对抗训练,同时,增加循环一致性约束,使得生成图像与隐空间更为匹配。在测试阶段,通过网络对源人物的纹理编码信息与目标的姿势编码信息分别进行编码,经过信息融合和解码获得姿势变换后的人物图像。使用DeepFashion数据集进行定性和定量测试,结果表明,该模型的峰值信噪比、感知评分、结构相似性指标分别达到31.409 dB、3.369、0.768,模型中添加的姿势引导条件和循环一致性约束能够简化属性分解的概率生成表达,使得人物生成图像的纹理更为准确,符合人类视觉感知特性。

关键词: 人物图像生成, 姿势变换, 生成对抗网络, 人体关键点估计, 人体语义分割, 循环一致性

CLC Number: