基于状态异步DBN的语音驱动面部动画合成

doi:10.3969/j.issn.1000-3428.2014.02.039

计算机工程

基于状态异步DBN的语音驱动面部动画合成

赵勇¹，蒋冬梅¹，Sahli Hichem²

(1. 西北工业大学计算机学院，西安 710072；2. 布鲁塞尔自由大学电子与信息工程系，比利时布鲁塞尔 1050)

收稿日期:2013-01-01 出版日期:2014-02-15 发布日期:2014-02-13
作者简介:赵勇(1988－)，男，硕士研究生，主研方向：可视语音合成；蒋冬梅、Sahli Hichem，教授
基金资助:
国家自然科学基金资助项目(61273265)；陕西省国际科技合作基金资助重点项目(2011KW-04)

Speech Driven Facial Animation Synthesis Based on State Asynchronous DBN

ZHAO Yong ¹, JIANG Dong-mei ¹, Sahli Hichem ²

(1. School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China 2. ETRO Department, Vrije Universiteit Brussel, Brussels 1050, Belgium)

Received:2013-01-01 Online:2014-02-15 Published:2014-02-13

摘要/Abstract

摘要： 提出一种基于状态异步动态贝叶斯网络模型(SA-DBN)的语音驱动面部动画合成方法。提取音视频语音数据库中音频的感知线性预测特征和面部图像的主动外观模型(AAM)特征来训练模型参数，对于给定的输入语音，基于极大似然估计原理学习得到对应的最优AAM特征序列，并由此合成面部图像序列和面部动画。对合成面部动画的主观评测结果表明，与听视觉状态同步的DBN模型相比，通过限制听觉语音状态和视觉语音状态间的最大异步程度，SA-DBN可以得到清晰自然并且嘴部运动与输入语音高度一致的面部动画。

关键词: 面部动画合成, 状态异步动态贝叶斯网络模型, 异步约束, 主动外观模型, 感知线性预测, 极大似然估计

Abstract: An audio visual Dynamic Bayesian Network model with State Asynchrony(SA-DBN) transforming acoustic speech to photo realistic facial animation is proposed. Perceptual Linear Prediction(PLP) features from audio speech, as well as Active Appearance Model(AAM) features from face images of an audio visual speech database, are adopted to train the model parameters of the proposed SA-DBN. Based on the SADBN model, an input audio stream is given, the optimal AAM visual features are learned by the Maximum Likelihood Estimation(MLE) criterion, which are used to construct facial images for the animation. Subjective evaluation is presented to compare the proposed constrained state asynchrony DBN with a state synchronous audio visual DBN model. Experimental results show that with the SA-DBN model, high quality facial animations can be obtained with mouth movements matching the input speech.

Key words: facial animation synthesis, Dynamic Bayesian Network model with State Asynchrony(SA-DBN), asynchrony constraint, Active Appearance Model(AAM), Perceptual Linear Prediction(PLP), Maximum Likelihood Estimation(MLE)

中图分类号:

TP18

赵勇，蒋冬梅，Sahli Hichem. 基于状态异步DBN的语音驱动面部动画合成[J]. 计算机工程.

ZHAO Yong, JIANG Dong-mei, Sahli Hichem. Speech Driven Facial Animation Synthesis Based on State Asynchronous DBN[J]. Computer Engineering.

https://www.ecice06.com/CN/Y2014/V40/I2/180

参考文献

(上接第183页) 参考文献 [1] 王志明, 蔡莲红, 艾海舟. 基于数据驱动方法的汉语文本可视语音合成[J]. 软件学报, 2005, 16(6): 1054-1063. [2] 李冰峰, 谢磊, 周祥增, 等. 实时语音驱动的虚拟说话人[J]. 清华大学学报: 自然科学版, 2011, 51(9): 1180-1186. [3] Bregler C, Covell M, Slaney M. Video Rewrite: Driving Visual Speech with Audio[C]//Proc. of SIGGRAPH’97. New York, USA: [s. n.], 1997: 353-360. [4] Choi K, Luo Y, Hwang J. Hidden Markov Model Inversion for Audio-to-visual Conversion in an MPEG-4 Facial Animation System[J]. Journal of VLSI Signal Processing, 2001, 29(1/2): 51-61. [5] Terissi L D, Gomez J C. Audio-to-visual Conversion via HMM Inversion for Speech-driven Facial Animation[C]//Proc. of SBIA’08. Brasilia, Brazil: [s. n.], 2008: 33-42. [6] Gowdy J N, Subramanya A, Bartels C. DBN Based Multi- stream Models for Audio-visual Speech Recognition[C]//Proc. of ICASSP’04. New York, USA: [s. n.], 2004. [7] Cootes T, Edwards G, Taylor C. Active Appearance Models[C]// Proc. of ECCV’98. Berlin, Germany: [s. n.], 1998: 484-498. [8] Zhang Yimin, Diao Qian, Huang Shan, et al. DBN Based Multi-stream Models for Speech[C]//Proc. of ICASSP’03. Beijing, China: [s. n.], 2003: 836-839. [9] Young S, Evermann G, Kershaw D, et al. The HTK Book[M]. Cambridge, UK: Cambridge University Press, 2002. [10] Hou Y, Sahli H, Ravyse I, et al. Robust Shape Based Head Tracking[C]//Proc. of Advanced Concepts for Intelligent Vision Systems. Amsterdam, the Nertherland: [s. n.], 2007: 340-351. [11] AM_TOOLS工具包[EB/OL]. (2012-10-10). http://personal pages.manchester.ac.uk/staff/timothy.f.cootes/software/am_tools_doc/index.html. [12] Hirsh H G, Pearce D. The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions[C]//Proc. of International Workshop on Automatic Speech Recognition. Paris, France, [s. n.], 2000: 181-188. [13] Bilmes J, Zweig G. The Graphical Models Toolkit: An Open Source Software System for Speech and Time Series Processing[C]//Proc. of ICASSP’02. New York, USA: [s. n.], 2002: 3916-3919. 编辑索书志

选择文件类型/文献管理软件名称

选择包含的内容

基于状态异步DBN的语音驱动面部动画合成

Speech Driven Facial Animation Synthesis Based on State Asynchronous DBN

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价

[1]	彭大芹, 李靖. 面向NB-IoT终端的指纹匹配定位改进算法[J]. 计算机工程, 2020, 46(3): 178-183,191.
[2]	曾威龙，奚宏生，朱里越，胡晗. 基于访问量预测的数据中心自适应节能机制[J]. 计算机工程, 2014, 40(2): 6-10.
[3]	刘风梅,葛洪伟,杨金龙,李鹏. 基于均值漂移聚类的扩展目标量测集划分算法[J]. 计算机工程, 2014, 40(12): 182-187,194.
[4]	张伟，黄炜，夏利民，罗大庸. 基于SLPP与MKSVM的痛苦表情识别[J]. 计算机工程, 2013, 39(12): 196-199.
[5]	杜晓青，于凤芹. 基于发声机理与人耳感知特性的说话人识别[J]. 计算机工程, 2013, 39(11): 197-199,204.
[6]	吕兰兰, 蒋冬梅, 王风娜, Hichem Sahli, Werner Verhelst. 基于三流DBN模型的听视觉情感识别[J]. 计算机工程, 2012, 38(5): 161-162,166.
[7]	王俐莉, 刘力维, 刘伟涛. 基于统计理论的电台数目估计方法[J]. 计算机工程, 2011, 37(01): 22-23,27.
[8]	龚丹丹, 刘国庆. 基于极大似然Parzen窗的独立成分分析[J]. 计算机工程, 2010, 36(18): 279-281.
[9]	柳晓燕. 基于模糊极大似然估计聚类的点云数据分块[J]. 计算机工程, 2010, 36(06): 86-88.
[10]	郑军;王巍;杨武;杨永田. 基于类间距离参数估计的文本聚类评价方法[J]. 计算机工程, 2009, 35(9): 37-39,4.
[11]	刘爱平;周焰;关鑫璞. 改进的ASM方法在人脸定位中的应用[J]. 计算机工程, 2007, 33(18): 227-229,.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于状态异步DBN的语音驱动面部动画合成

Speech Driven Facial Animation Synthesis Based on State Asynchronous DBN

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价