作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2019, Vol. 45 ›› Issue (3): 262-267,272. doi: 10.19678/j.issn.1000-3428.0049975

• 多媒体技术及应用 • 上一篇    下一篇

基于深度学习的小样本声纹识别方法

李靓1a,孙存威1b,谢凯1a,贺建飚2   

  1. 1.长江大学a.电子信息学院; b.计算机科学学院,湖北 荆州 434023; 2中南大学 信息科学与工程学院,长沙 410083
  • 收稿日期:2018-01-04 出版日期:2019-03-15 发布日期:2019-03-15
  • 作者简介:李靓(1996—),男,硕士研究生,主研方向为语音信号处理、图像处理;孙存威(通信作者),硕士研究生;谢凯,教授、博士生导师;贺建飚,副教授。
  • 基金资助:

    国家自然科学基金(61272147);湖北省教育厅项目(B2015446);长江大学青年基金(2016cqn10);大学生创新创业计划基金(2017009)。

Small Sample Voiceprint Recognition Method Based on Deep Learning

LI Jing1a,SUN Cunwei1b,XIE Kai1a,HE Jianbiao2   

  1. 1a.School of Electronic and Information; 1b.School of Computer Science,Yangtze University,Jingzhou,Hubei 434023,China; 2.College of Information Science and Engineering,Central South University,Changsha 410083,China
  • Received:2018-01-04 Online:2019-03-15 Published:2019-03-15

摘要:

利用小样本声纹作为训练集训练卷积神经网络(CNN)时,网络不能达到较好的收敛状态,从而导致识别率较低。为此,提出一种新的声纹识别方法。利用深度CNN提取潜在的声纹特征,在CNN训练过程中采用基于凸透镜成像原理的图像增多算法解决小样本训练样本不足的问题,并在卷积过程中引入快速批量归一化(FBN)方法以提高网络收敛速度、缩短训练时间。在包含630人的TIMIT语音数据库中进行训练、验证和测试,结果表明,FBN-Alexnet网络比Alexnet网络训练时间缩短48.2%,与GMM、GMM-UBM及GMM-SVM方法相比,该方法识别率分别提高7.3%、2.2%、2.8%。

关键词: 声纹识别, 深度学习, FBN-Alexnet网络, 小样本, 快速批量归一化, 图像增多算法

Abstract:

When training Convolutional Neural NetWork(CNN) with small sample voiceprints as training set,the network cannot reach a good convergence state,which results in low recognition rate.So,this paper proposes a new voiceprint recognition method.The proposed method uses deep CNN to extract the rich and latent features of voiceprint,which improves the voiceprint recognition rate.In order to solve the problem that small sample cannot train the CNN,this paper proposes an image increasing algorithm based on the principle of convex lens imaging.At the same time,the Fast Batch Normalization (FBN) is introduced in the convolutional process,which improves the speed of the network convergence and shortens the training time.Select a TIMIT speech database containing voices of 630 speakers for training,validating and testing.Experimental results show that,compared with the GMM,GMM-UBM,and GMM-SVM algorithms,the proposed method improves the recognition rate by 7.3%,2.2%,and 2.8% and compared with the original network,the training time of the FBN-Alexnet network is reduced by 48.2%.It means that it is an effective method for voiceprint recognition of small samples.

Key words: voiceprint recognition, deep learning, FBN-Alexnet network, small sample, Fast Batch Normalization (FBN), image increasing algorithm

中图分类号: