计算机工程 ›› 2018, Vol. 44 ›› Issue (5): 262-267.doi: 10.19678/j.issn.1000-3428.0046946

• 多媒体技术及应用 • 上一篇    下一篇

基于改进i-vector的说话人感知训练方法研究

梁玉龙,屈丹,邱泽宇   

  1. 解放军信息工程大学 信息系统工程学院,郑州 450002
  • 收稿日期:2017-04-25 出版日期:2018-05-15 发布日期:2018-05-15
  • 作者简介:梁玉龙(1991—),男,硕士研究生,主研方向为语音识别、机器学习;屈丹,副教授、博士生导师;邱泽宇,硕士研究生。
  • 基金项目:
    国家自然科学基金(61673395,61403415);河南省自然科学基金(162300410331)。

Research on Speaker Aware Training Method Based on Improved i-vector

LIANG Yulong,QU Dan,QIU Zeyu   

  1. School of Information and Systems Engineering,PLA Information Engineering University,Zhengzhou 450002,China
  • Received:2017-04-25 Online:2018-05-15 Published:2018-05-15

摘要: 基于辨识向量(i-vector)的说话人感知训练方法使用MFCC作为输入特征对i-vector进行提取,但MFCC较差的特征鲁棒性会影响该训练方法的识别性能。为此,提出一种基于改进i-vector的说话人感知训练方法。设计基于SVD的低维特征提取方法,用其提取的特征替代MFCC对表征能力更优的i-vector进行提取。实验结果表明,在捷克语语料库中,相对于DNN-HMM语音识别系统与原始基于i-vector的说话人感知训练方法,该方法的识别性能分别提升了1.62%与1.52%,在WSJ语料库中,该方法识别性能分别提升了3.9%和1.48%。

关键词: 说话人感知训练, 辨识向量, 深度神经网络, 奇异值矩阵分解, 瓶颈特征

Abstract: The performance of speaker aware training method based on i-vector is poor because of using MFCC which has the relative poor robustness as the input feature for the extraction of the i-vector.To solve this problem,an improved i-vector based speaker aware training method is proposed.Firstly,a low dimensional feature extraction method based on SVD is proposed,and then the feature extracted by this method is used to replace the MFCC,which can extract better i-vector.Experimental results show that,in the Vystadial_cz corpus,compared with the DNN-HMM speech recognition system and the original i-vector based speaker aware training method,the recognition performance of this method is increased by 1.62% and 1.52% respectively,in the WSJ corpus,the recognition performance of this method is increased by 3.9% and 1.48% respectively.

Key words: speaker aware training, i-vector, Deep Neural Network(DNN), Singular Value Matrix Decomposition(SVMD), bottleneck feature

中图分类号: