基于改进i-vector的说话人感知训练方法研究

doi:10.19678/j.issn.1000-3428.0046946

计算机工程 ›› 2018, Vol. 44 ›› Issue (5): 262-267. doi: 10.19678/j.issn.1000-3428.0046946

基于改进i-vector的说话人感知训练方法研究

梁玉龙,屈丹,邱泽宇

解放军信息工程大学信息系统工程学院,郑州 450002

收稿日期:2017-04-25 出版日期:2018-05-15 发布日期:2018-05-15
作者简介:梁玉龙(1991—),男,硕士研究生,主研方向为语音识别、机器学习;屈丹,副教授、博士生导师;邱泽宇,硕士研究生。
基金资助:
国家自然科学基金（61673395,61403415）;河南省自然科学基金（162300410331）。

Research on Speaker Aware Training Method Based on Improved i-vector

LIANG Yulong,QU Dan,QIU Zeyu

School of Information and Systems Engineering,PLA Information Engineering University,Zhengzhou 450002,China

Received:2017-04-25 Online:2018-05-15 Published:2018-05-15

摘要/Abstract

摘要： 基于辨识向量(i-vector)的说话人感知训练方法使用MFCC作为输入特征对i-vector进行提取,但MFCC较差的特征鲁棒性会影响该训练方法的识别性能。为此,提出一种基于改进i-vector的说话人感知训练方法。设计基于SVD的低维特征提取方法,用其提取的特征替代MFCC对表征能力更优的i-vector进行提取。实验结果表明,在捷克语语料库中,相对于DNN-HMM语音识别系统与原始基于i-vector的说话人感知训练方法,该方法的识别性能分别提升了1.62%与1.52%,在WSJ语料库中,该方法识别性能分别提升了3.9%和1.48%。

关键词: 说话人感知训练, 辨识向量, 深度神经网络, 奇异值矩阵分解, 瓶颈特征

Abstract: The performance of speaker aware training method based on i-vector is poor because of using MFCC which has the relative poor robustness as the input feature for the extraction of the i-vector.To solve this problem,an improved i-vector based speaker aware training method is proposed.Firstly,a low dimensional feature extraction method based on SVD is proposed,and then the feature extracted by this method is used to replace the MFCC,which can extract better i-vector.Experimental results show that,in the Vystadial_cz corpus,compared with the DNN-HMM speech recognition system and the original i-vector based speaker aware training method,the recognition performance of this method is increased by 1.62% and 1.52% respectively,in the WSJ corpus,the recognition performance of this method is increased by 3.9% and 1.48% respectively.

Key words: speaker aware training, i-vector, Deep Neural Network(DNN), Singular Value Matrix Decomposition(SVMD), bottleneck feature

中图分类号:

TN912

梁玉龙,屈丹,邱泽宇. 基于改进i-vector的说话人感知训练方法研究[J]. 计算机工程, 2018, 44(5): 262-267.

LIANG Yulong,QU Dan,QIU Zeyu. Research on Speaker Aware Training Method Based on Improved i-vector[J]. Computer Engineering, 2018, 44(5): 262-267.

http://www.ecice06.com/CN/Y2018/V44/I5/262

参考文献

［1］HINTON G,DENG L,YU D,et al.Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups［J］.IEEE Signal Processing Magazine,2012,29(6):82-97.
［2］DAHL G E,YU D,DENG L,et al.Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition［J］.IEEE Transactions on Audio Speech and Language Processing,2012,20(1):30-42.
［3］李传朋,秦品乐,张晋京.基于深度卷积神经网络的图像去噪研究［J］.计算机工程,2017,43(3):253-260.
［4］梁玉龙,屈丹,李真,等.基于卷积神经网络的维吾尔语语音识别［J］.信息工程大学学报,2017,18(1):44-50.
［5］秦楚雄,张连海.低资源语音识别中融合多流特征的卷积神经网络声学建模方法［J］.计算机应用,2016,36(9):2609-2615.
［6］LIAO H.Speaker adaptation of context dependent deep neural networks［C］//Proceedings of 2013 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2013:7947-7951.
［7］SEIDE F,LI G,CHEN X,et al.Feature engineering in context-dependent deep neural networks for conversational speech transcription［C］//Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding.Washington D.C.,USA:IEEE Press,2011:24-29.
［8］YAO K,YU D,SEIDE F,et al.Adaptation of context-dependent deep neural networks for automatic speech recognition［C］//Proceedings of 2012 IEEE Workshop on Spoken Language Technology.Washington D.C.,USA:IEEE Press,2012:366-369.
［9］HAMID O A,JIANG H.Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition［EB/OL］.［2017-04-25］.http://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_1248.pdf.
［10］SELTZER M,YU D,WANG Y.An investigation of deep neural networks for noise robust speech recognition［C］//Proceedings of 2013 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2013:7398-7402.
［11］YOSHIOKA T,RAGNI A,GALES M J.Investigation of unsupervised adaptation of DNN acoustic models with filterbank input［C］//Proceedings of 2014 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2014:6344-6348.
［12］DELCROIX M,KINOSHITA K,HORI T,et al.Context adaptive deep neural networks for fast acoustic model adaptation［C］//Proceedings of 2015 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2015:5270-5274.
［13］KARANASOU P,WANG Y,GALES M J F,et al.Adaptation of deep neural network acoustic models using factorized i-vectors［EB/OL］.［2017-04-20］.http://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_2180.pdf.
［14］SENIOR A,MORENO I L.Improving DNN speaker independence with i-vector inputs［C］//Proceedings of 2014 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2014:225-229.
［15］ROUVIER M,FAVRE B.Speaker adaptation of DNN-based ASR with i-vectors:does it actually adapt models to speakers?［EB/OL］.［2017-04-20］.http://pageperso.lif.univ-mrs.fr/~benoit.favre/papers/favre_interspeech 2014a.pdf.
［16］YU C,OGAWA A,DELCROIX M,et al.Robust i-vector extraction for neural network adaptation in noisy environment［EB/OL］.［2017-04-15］.http://www.isca-speech.org/archive/interspeech_2015/papers/i15_2854.pdf.
［17］SAON G,SOLTAU H,NAHAMOO D,et al.Speaker adaptation of neural network acoustic models using i-vectors［C］//Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.Washington D.C.,USA:IEEE Press,2013:55-59.
［18］XUE S F,HAMID O A,JIANG H,et al.Fast adaptation of deep neural network based on discriminant codes for speech recognition［J］.IEEE/ACM Transactions on Audio,Speech and Language Processing,2014,22(12):1713-1725.

[1]	靳雁霞, 史志儒, 杨晶, 刘亚变, 乔星宇, 张翎. 布料与精细建模物体间的碰撞检测算法研究[J]. 计算机工程, 2023, 49(7): 269-277.
[2]	陈锐, 孙羽菲, 郭强, 隋轶丞, 周振辉, 石昌青, 张玉志. OclDNN:一种可应用于TensorFlow的通用DNN库[J]. 计算机工程, 2023, 49(4): 138-148.
[3]	石磊, 张吉涛, 高宇飞, 卫琳, 陶永才. 基于Transformer与BiLSTM的网络流量入侵检测[J]. 计算机工程, 2023, 49(3): 29-36,57.
[4]	王春东, 孙嘉琪, 杨文军. 基于矫正理解的中文文本对抗样本生成方法[J]. 计算机工程, 2023, 49(2): 37-45.
[5]	刘金硕, 詹岱依, 邓娟, 王丽娜. 基于深度神经网络和联邦学习的网络入侵检测[J]. 计算机工程, 2023, 49(1): 15-21,30.
[6]	董卫宇, 李海涛, 王瑞敏, 任化娟, 孙雪凯. 基于堆叠卷积注意力的网络流量异常检测模型[J]. 计算机工程, 2022, 48(9): 12-19.
[7]	张恒, 陈晓红, 蓝宇翔, 李舜酩. 基于深度学习的监督型典型相关分析[J]. 计算机工程, 2022, 48(5): 222-228.
[8]	路东生, 张玉金, 党良慧. 面向图像篡改取证的多特征融合U形深度网络[J]. 计算机工程, 2022, 48(4): 213-222.
[9]	杨文雪, 吴非, 郭桐, 肖利民. 基于噪声溶解的对抗样本防御方法[J]. 计算机工程, 2022, 48(4): 158-164.
[10]	李哲铭, 张恒巍, 马军强, 王晋东, 杨博. 基于平移随机变换的对抗样本生成方法[J]. 计算机工程, 2022, 48(11): 152-160,183.
[11]	刘先锋, 梁赛, 李强, 张锦. 基于深度强化学习的云边协同DNN推理[J]. 计算机工程, 2022, 48(11): 30-38.
[12]	蒋兴渝, 黄贤英, 陈雨晶, 徐福. 基于特征增强聚合的融合广告点击率预测模型[J]. 计算机工程, 2022, 48(1): 312-320.
[13]	斯捷, 肖雄, 李泾, 马明勋, 毛玉星. 基于生成对抗网络的多幅离焦图像超分辨率重建算法[J]. 计算机工程, 2021, 47(9): 266-273.
[14]	刘奇, 赵丽霞, 郑曙光, 赵希梅. 基于DYOLO神经网络的超声图像肾脏检测[J]. 计算机工程, 2021, 47(7): 307-313.
[15]	季繁繁, 杨鑫, 袁晓彤. 基于深度神经网络二阶信息的结构化剪枝算法[J]. 计算机工程, 2021, 47(2): 12-18.

选择文件类型/文献管理软件名称

选择包含的内容

基于改进i-vector的说话人感知训练方法研究

Research on Speaker Aware Training Method Based on Improved i-vector

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于改进i-vector的说话人感知训练方法研究

Research on Speaker Aware Training Method Based on Improved i-vector

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价