Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2011, Vol. 37 ›› Issue (22): 268-269. doi: 10.3969/j.issn.1000-3428.2011.22.089

• Networks and Communications • Previous Articles     Next Articles

Audio Visual Fusion Speech Recognition Model Based on Articulatory Feature

WU Peng 1, JIANG Dong-mei 1, WANG Feng-na 2, Hichem SAHLI 2, Werner VERHELST 2   

  1. (1. Shaanxi Provincial Key Laboratory on Speech, Image and Information Processing, Northwestern Polytechnical University, Xi’an 710072, China; 2. Department of ETRO, Vrije Universiteit Brussel, Brussels 1050, Belgium)
  • Received:2011-05-11 Online:2011-11-18 Published:2011-11-20

基于发音特征的音视频融合语音识别模型

吴 鹏 1,蒋冬梅 1,王风娜 2,Hichem SAHLI 2,Werner VERHELST 2   

  1. (1. 西北工业大学陕西省语音与图像信息处理重点实验室,西安 710072;2. 布鲁塞尔自由大学电子与信息工程系,比利时 布鲁塞尔 1050)
  • 作者简介:吴 鹏(1984-),男,硕士研究生,主研方向:语音识别,可视语音合成;蒋冬梅,教授;王风娜,博士;Hichem SAHLI、Werner VERHELST,教授
  • 基金资助:

    国家自然科学基金资助项目(60703104);陕西省自然科学基金资助项目(SJ08F28);西北工业大学基础研究基金资助项目(JC200943)

Abstract: A multi-stream Dynamic Bayesian Network(DBN) model(AF_AV_DBN) based on Articulatory Feature(AF) is proposed for audio visual speech recognition. Conditional probability distribution of each node and the degree of asynchrony between the AFs are defined, and speech recognition experiments are carried out on an audio visual connected digit database. Compared with the audio-only AF_A_DBN model, the state synchronous DBN model and the state asynchronous DBN model, the designed AF_AV_DBN model gets the highest recognition rate under various signal to noise ratios, and is more robust to background noise.

Key words: Dynamic Bayesian Network(DBN), articulatory feature, audio visual fusion, speech recognition, asynchronous

摘要: 构建一种基于发音特征的音视频双流动态贝叶斯网络(DBN)语音识别模型(AF_AV_DBN),定义节点的条件概率关系,使发音特征状态的变化可以异步。在音视频语音数据库上的语音识别实验表明,通过调整发音特征之间的异步约束,AF_AV_DBN模型能得到比基于状态的同步和异步DBN模型以及音频单流模型更高的识别率,对噪声也具有较好的鲁棒性。

关键词: 动态贝叶斯网络, 发音特征, 音视频融合, 语音识别, 异步

CLC Number: