摘要: 构建一种基于发音特征的音视频双流动态贝叶斯网络(DBN)语音识别模型(AF_AV_DBN),定义节点的条件概率关系,使发音特征状态的变化可以异步。在音视频语音数据库上的语音识别实验表明,通过调整发音特征之间的异步约束,AF_AV_DBN模型能得到比基于状态的同步和异步DBN模型以及音频单流模型更高的识别率,对噪声也具有较好的鲁棒性。
关键词:
动态贝叶斯网络,
发音特征,
音视频融合,
语音识别,
异步
Abstract: A multi-stream Dynamic Bayesian Network(DBN) model(AF_AV_DBN) based on Articulatory Feature(AF) is proposed for audio visual speech recognition. Conditional probability distribution of each node and the degree of asynchrony between the AFs are defined, and speech recognition experiments are carried out on an audio visual connected digit database. Compared with the audio-only AF_A_DBN model, the state synchronous DBN model and the state asynchronous DBN model, the designed AF_AV_DBN model gets the highest recognition rate under various signal to noise ratios, and is more robust to background noise.
Key words:
Dynamic Bayesian Network(DBN),
articulatory feature,
audio visual fusion,
speech recognition,
asynchronous
中图分类号:
吴鹏, 蒋冬梅, 王风娜, Hichem SAHLI, Werner VERHELST. 基于发音特征的音视频融合语音识别模型[J]. 计算机工程, 2011, 37(22): 268-269.
TUN Feng, JIANG Dong-Mei, WANG Feng-Na, Hichem SAHLI, Werner VERHELST. Audio Visual Fusion Speech Recognition Model Based on Articulatory Feature[J]. Computer Engineering, 2011, 37(22): 268-269.