基于瓶颈复合特征的声学模型建立方法

doi:10.19678/j.issn.1000-3428.0056278

摘要/Abstract

摘要： 针对梅尔频率倒谱系数（MFCC）语音特征不能有效反映连续帧之间有效信息的问题，基于深度神经网络相关性和紧凑性特征，提出一种融合神经网瓶颈特征与MFCC特征的复合特征构造方法，提高语音的表征能力和建模能力。从语音数据中提取MFCC特征作为输入数据，将MFCC特征和BN特征进行串接得到新的复合特征，并进行GMM-HMM声学建模。在TIMIT数据库上的实验结果表明，与单一的瓶颈特征和深度神经网络后验特征相比，该方法识别率明显提升。

关键词: 深度神经网络, 梅尔频率倒谱系数, 瓶颈特征, 复合特征, 高斯混合模型-隐马尔科夫模型

Abstract: The Mel-Frequency Cepstral Coefficient(MFCC) speech features cannot effectively reflect the effective information between consecutive frames.To address the problem,this paper uses deep neural network to extract bottleneck features with long-term correlation and compactness of speech,and on this basis proposes a compound feature construction method that combines the neural network bottleneck features and the MFCC feature,so as to improve the speech characterization and modeling capabilities.The MFCC feature is extracted from the speech data as the input,and then concatenated with the BN feature to obtain a new compound feature.On this basis the acoustic modeling of Mixture Model-Hidden Markov Model(GMM-HMM) is implemented.Experimental results on the TIMIT database show that compared with the methods based on the single bottleneck feature and deep neural network posterior feature,the proposed method can significantly increases the recognition rate.

Key words: Deep Neural Networks(DNN), Mel-Frequency Cepstral Coefficient(MFCC), bottleneck feature, compound feature, Gaussian Mixture Model-Hidden Markov Model(GMM-HMM)

中图分类号:

TP391

郑文秀, 赵峻毅, 文心怡, 姚引娣. 基于瓶颈复合特征的声学模型建立方法[J]. 计算机工程, 2020, 46(11): 301-305,314.

ZHENG Wenxiu, ZHAO Junyi, WEN Xinyi, YAO Yindi. Acoustic Model Construction Method Based on Bottleneck Compound Feature[J]. Computer Engineering, 2020, 46(11): 301-305,314.

https://www.ecice06.com/CN/Y2020/V46/I11/301

图/表 7

20201124091314

20201124091317

20201124091320

20201124091323

20201124091327

20201124091331

20201124091334

参考文献

[1] SCHWARZ P.Phoneme recognition based on long temporal context[EB/OL].[2013-09-10].http://speech.fit.Vutbr.cz/soft-ware/Phoneme-recognizer-based-long-temporal-context.
[2] WANG Yi,YANG Junan,LIU Hui,et al.Bottleneck feature extraction method based on hierarchical deep sparse belief network[J].Pattern Recognition and Artificial Intelligence,2015,28(2):173-180.(in Chinese) 王一,杨俊安,刘辉,等.基于层次稀疏DBN的瓶颈特征提取方法[J].模式识别与人工智能,2015,28(2):173-180.
[3] MOHAMED A R,DAHL G,HINTON G.Acoustic modeling using deep belief networks[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(1):14-22.
[4] DAHL G E,YU D,DENG L,et al.Context-dependent pre-trained deep neural networks for large vocabulary speech recognition[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(1):30-42.
[5] LUO Yuan,LIU Yu,ZHANG Yi,et al.Speech bottleneck feature extraction method based on overlapping group lasso sparse deep neural network[J].Journal of Speech Communication,2018,99:56-61.
[6] YU D,SELTZER M.Improved bottleneck features using pre-trained deep neural networks[C]//Proceedings of the 12th Annual Conference of the International Speech Communication Association.Florence,Italy:[s.n.],2011:237-240.
[7] LIU Yuqing,WANG Tianhao,XU Xu.New adaptive activation function for deep learning neural networks[J].Journal of Jilin University Science Edition,2019,57(4):857-859.(in Chinese) 刘宇晴,王天昊,徐旭.深度学习神经网络的新型自适应激活函数[J].吉林大学学报(理学版),2019,57(4):857-859.
[8] GREZL F,KARAFIATT M,KONTAR S,et al.Probabilistic and bottle-neck features for LVCSR of meetings[C]//Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing.Honolulu,USA:IEEE Press,2007:757-760.
[9] CHEN Lei,YANG Junan,WANG Yi,et al.A feature extraction method based on discriminative and adaptive bottleneck deep belief network in large vocabulary continuous speech recognition system[J].Journal of Signal Processing,2015,31(3):290-298.(in Chinese) 陈雷,杨俊安,王一,等.LVCSR系统中一种基于区分性和自适应瓶颈深度置信网络的特征提取方法[J].信号处理,2015,31(3):290-298.
[10] SINISCALCHI S M,YU D,DENG L.et al.Speech recognition using long-span temporal patterns in a deep network Model[J].IEEE Signal Processing Letters,2013,20(3):201-204.
[11] WANG Zhaokai,LI Yaxing,FENG Xupeng,et al.Personalized information recommendation based on deep belief network[J].Computer Engineering,2016,42(10):201-206.(in Chinese). 王兆凯,李亚星,冯旭鹏,等.基于深度信念网络的个性化信息推荐[J].计算机工程,2016,42(10):201-206.
[12] LI Jinhui,YANG Junan,WANG Yi.New feature extraction method based on bottleneck deep belief networks and its application in language recognition[J].Computer Science,2014,41(3):263-266.(in Chinese) 李晋徽,杨俊安,王一.一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用[J].计算机科学,2014,41(3):263-266.
[13] PINTO J,SIVARAM G S V S,DOSS M M,et al.Analysis of MLP based hierarchical phoneme posterior probability estimator[J].IEEE Transcations on Audio,Speech,and Language Processing,2010,19(2):225-241.
[14] LÜ D,HOFFMEISTER B.Study on the compounding of Chinese phonetic acoustics[J].Journal of Yunnan University (Natural Science Edition),2010,32(Sup):368-371.(in Chinese) 吕丹桔,HOFFMEISTER B.汉语语音声学特征复合的研究[J].云南大学学报(自然科学版),2010,32(增刊):368-371.
[15] ZHOU Nan,ZHAO Yue,LI Yaoqiang,et al.Study on continuous speech recognition based on bottleneck features for Lhasa-Tibetan dialect[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2018,54(2):249-254.(in Chinese)周楠,赵悦,李要嫱,等.基于瓶颈特征的藏语拉萨话连续语音识别研究[J].北京大学学报(自然科学版),2018,54(2):249-254.
[16] LIU Diyuan.Research on BN features based acoustic modeling and its application in keyword retrieval[D].Hefei:University of Science and Technology of China,2015.(in Chinese)刘迪源.基于BN特征的声学建模研究及其在关键词检索中的应用[D].合肥:中国科学技术大学,2015.
[17] LI Yunhong,LIANG Sicheng,JIA Kaili,et al.An improved speech recognition base on DNN-HMM model[J].Journal of Applied Acoustics,2019,38(3):371-377.(in Chinese)李云红,梁思程,贾凯莉,等.一种改进的DNN-HMM的语音识别方法[J].应用声学,2019,38(3):371-377.
[18] QIN Chuxiong,ZHANG Lianhai.Deep neural network base feature extraction for low-resource speech recognition[J].Acta Automatica Sinica,2017,43(7):1208-1219.(in Chinese)秦楚雄,张连海.基于DNN的低资源语音识别特征提取技术[J].自动化学报,2017,43(7):1208-1219.
[19] LI Haixia,WU Suyi.Attribute reduction and optimization for massive seismic data base on principal component analysis[J].China Earthquake Engineering Journal,2019,41(3):757-762.(in Chinese)李海霞,吴苏怡.基于主成分分析法的海量地震数据属性降维优化[J].地震工程学报,2019,41(3):757-762.
[20] ZHOU Bufang.A dissertation for the master degree of engineering[D].Zhangzhou:Minnan Normal University,2017.(in Chinese)周步芳.多标签学习的特征降维方法[D].漳州:闽南师范大学,2017.

选择文件类型/文献管理软件名称

选择包含的内容