作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (6): 255-265. doi: 10.19678/j.issn.1000-3428.0068130

• 图形图像处理 • 上一篇    下一篇

基于改进薄板样条运动模型的人脸动画算法

杨硕, 王一丁   

  1. 北方工业大学信息学院, 北京 100144
  • 收稿日期:2023-07-24 修回日期:2023-10-01 发布日期:2023-10-30
  • 通讯作者: 王一丁,E-mail:wangyd@ncut.edu.cn E-mail:wangyd@ncut.edu.cn
  • 基金资助:
    国家自然科学基金(62276018)。

Facial Animation Algorithm Based on Improved Thin Plate Spline Motion Model

YANG Shuo, WANG Yiding   

  1. School of Information, North China University of Technology, Beijing 100144, China
  • Received:2023-07-24 Revised:2023-10-01 Published:2023-10-30

摘要: 面部动画在电影、游戏、虚拟现实等领域起着关键作用,对于实现逼真、生动的人脸动画和情感传达至关重要。当面临面部形状、姿态、表情等多个变化因素时,虽然通过薄板样条非线性变换可以获得较好的运动估计结果,但在处理面部复杂纹理和嘴部运动时存在运动估计不精细的问题,需要更强大的图像修复能力。因此,提出一种基于改进薄板样条运动模型(TPSMM)的人脸动画算法。首先,在TPSMM的基础上引入一种Farneback光流金字塔算法,通过与薄板样条变换和背景仿射变换相结合,使得人脸局部运动估计更精准;其次,为了更真实地恢复缺失区域的细节纹理信息,提出一种多尺度细节感知网络,该网络在编码器中通过嵌入通道注意力(ECA)模块减少源图像因多层下采样而导致的人脸细节信息丢失,在解码器中利用坐标注意力(CA)模块来有效捕获运动估计特征图中不同位置的重要特征,提高人脸图像的生成质量。实验结果表明,相比一阶段运动模型(FOMM)、关节动画的运动表示法(MRAA)、TPSMM等,该算法在MUG、UvA-Nemo和Oulu-CASIA数据集上的L1、平均关键点距离(AKD)、平均欧氏距离(AED)数值均达到最优,平均分别为0.0129、0.923、0.00099。

关键词: 面部动画, 光流估计, 薄板样条, 多尺度特征融合, 通道注意力机制, 坐标注意力机制

Abstract: Facial animation plays a crucial role in applications involving movies, games, and virtual reality in terms of achieving realistic and vivid emotional communication. When handling multiple factors, such as facial shape, posture, and expression, good motion estimation results can be obtained through thin plate spline nonlinear transformation. However, this approach results in imprecise motion estimation when dealing with complex facial textures and mouth movements, necessitating better image restoration capabilities. To address this issue, this paper proposes a facial animation algorithm based on an improved Thin Plate Spline Motion Model (TPSMM). First, based on TPSMM, a Farneback optical flow pyramid algorithm is introduced, which combines the thin plate spline and background affine transformations to enhance the accuracy of local facial motion estimation. Second, to accurately recover the detailed textural information for missing areas, a multi-scale detail perception network is introduced. This network minimizes the loss of facial detail information caused by multi-layer downsampling of the source image by Embedding Channel Attention (ECA) modules in the encoder. In the decoder, the Coordinate Attention (CA) module effectively captures important features at different positions in the motion estimation feature map, thereby improving the quality of facial image generation. Experimental results show that, compared to the First Order Motion Model (FOMM), Motion Representations for Articulated Animation (MRAA), and TPSMM, the proposed algorithm achieves optimal L1, Average Keypoint Distance (AKD), and Average Euclidean Distance (AED) values on the MUG, UvA-Nemo, and Oulu-CASIA datasets, with averages of 0.0129, 0.923, and 0.00099, respectively.

Key words: facial animation, optical flow estimation, thin plate spline, multi-scale feature fusion, channel attention mechanism, coordinate attention mechanism

中图分类号: