基于深度学习的小样本声纹识别方法

doi:10.19678/j.issn.1000-3428.0049975

计算机工程 ›› 2019, Vol. 45 ›› Issue (3): 262-267,272. doi: 10.19678/j.issn.1000-3428.0049975

基于深度学习的小样本声纹识别方法

李靓^1a,孙存威^1b,谢凯^1a,贺建飚²

1.长江大学a.电子信息学院; b.计算机科学学院,湖北荆州 434023; 2中南大学信息科学与工程学院,长沙 410083

收稿日期:2018-01-04 出版日期:2019-03-15 发布日期:2019-03-15
作者简介:李靓(1996—),男,硕士研究生,主研方向为语音信号处理、图像处理;孙存威(通信作者),硕士研究生;谢凯,教授、博士生导师;贺建飚,副教授。
基金资助:
国家自然科学基金(61272147);湖北省教育厅项目(B2015446);长江大学青年基金(2016cqn10);大学生创新创业计划基金(2017009)。

Small Sample Voiceprint Recognition Method Based on Deep Learning

LI Jing^1a,SUN Cunwei^1b,XIE Kai^1a,HE Jianbiao²

1a.School of Electronic and Information; 1b.School of Computer Science,Yangtze University,Jingzhou,Hubei 434023,China; 2.College of Information Science and Engineering,Central South University,Changsha 410083,China

Received:2018-01-04 Online:2019-03-15 Published:2019-03-15

摘要/Abstract

摘要：

利用小样本声纹作为训练集训练卷积神经网络(CNN)时,网络不能达到较好的收敛状态,从而导致识别率较低。为此,提出一种新的声纹识别方法。利用深度CNN提取潜在的声纹特征,在CNN训练过程中采用基于凸透镜成像原理的图像增多算法解决小样本训练样本不足的问题,并在卷积过程中引入快速批量归一化(FBN)方法以提高网络收敛速度、缩短训练时间。在包含630人的TIMIT语音数据库中进行训练、验证和测试,结果表明,FBN-Alexnet网络比Alexnet网络训练时间缩短48.2%,与GMM、GMM-UBM及GMM-SVM方法相比,该方法识别率分别提高7.3%、2.2%、2.8%。

关键词: 声纹识别, 深度学习, FBN-Alexnet网络, 小样本, 快速批量归一化, 图像增多算法

Abstract:

When training Convolutional Neural NetWork(CNN) with small sample voiceprints as training set,the network cannot reach a good convergence state,which results in low recognition rate.So,this paper proposes a new voiceprint recognition method.The proposed method uses deep CNN to extract the rich and latent features of voiceprint,which improves the voiceprint recognition rate.In order to solve the problem that small sample cannot train the CNN,this paper proposes an image increasing algorithm based on the principle of convex lens imaging.At the same time,the Fast Batch Normalization (FBN) is introduced in the convolutional process,which improves the speed of the network convergence and shortens the training time.Select a TIMIT speech database containing voices of 630 speakers for training,validating and testing.Experimental results show that,compared with the GMM,GMM-UBM,and GMM-SVM algorithms,the proposed method improves the recognition rate by 7.3%,2.2%,and 2.8% and compared with the original network,the training time of the FBN-Alexnet network is reduced by 48.2%.It means that it is an effective method for voiceprint recognition of small samples.

Key words: voiceprint recognition, deep learning, FBN-Alexnet network, small sample, Fast Batch Normalization (FBN), image increasing algorithm

中图分类号:

TN912.34

李靓,孙存威,谢凯,贺建飚. 基于深度学习的小样本声纹识别方法[J]. 计算机工程, 2019, 45(3): 262-267,272.

LI Jing,SUN Cunwei,XIE Kai,HE Jianbiao. Small Sample Voiceprint Recognition Method Based on Deep Learning[J]. Computer Engineering, 2019, 45(3): 262-267,272.

http://www.ecice06.com/CN/Y2019/V45/I3/262

参考文献

［1］SALEEM M M,HANSEN J H L.A discriminative unsupervised method for speaker recognition using deep learning［C］//Proceedings of IEEE International Workshop on Machine Learning for Signal Processing.Washington D.C.,USA:IEEE Press,2016:1-5.
［2］陈锦飞,徐欣.基于梅尔频率倒谱系数与动态时间规整的安卓声纹解锁系统［J］.计算机工程,2017,43(2):201-205.
［3］BAE H S,LEE H J,LEE S G.Voice recognition based on adaptive MFCC and deep learning［C］//Proceedings of IEEE Conference on Industrial Electronics and Applications.Washington D.C.,USA:IEEE Press,2016:1542-1546.
［4］AZMY M M.Classification of lung sounds based on linear prediction cepstral coefficients and support vector machine［C］//Proceedings of Applied Electrical Engineering and Computing Technologies.Washington D.C.,USA:IEEE Press,2015:1-5.
［5］林舒都,邵曦.基于i-vector和深度学习的说话人识别［J］.计算机技术与发展,2017,27(6):66-71.
［6］CIRESAN D D,MEIER U,MASCI J,et al.Flexible,high performance convolutional neural networks for image classification［C］//Proceedings of the International Joint Conference on Artificial Intelligence.Palo Alto,USA:AAAI Press,2011:1237-1242.
［7］ABDEL-HAMID O,MOHAMED A R,JIANG H,et al.Convolutional neural networks for speech recognition［J］.IEEE/ACM Transactions on Audio Speech and Language Processing,2014,22(10):1533-1545.
［8］HUANG J T,LI J,GONG Y.An analysis of convolutional neural networks for speech recognition［C］//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2015:4989-4993.
［9］ZHANG Y,PEZESHKI M,BRAKEL P,et al.Towards end-to-end speech recognition with deep convolutional neural networks［EB/OL］.［2017-11-12］.https://arxiv.org/abs/1701.02720.
［10］OQUAB M,BOTTOU L,LAPTEV I,et al.Learning and transferring mid-level image representations using convolutional neural networks［C］//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2014:1717-1724.
［11］KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks［C］//Proceedings of International Conference on Neural Information Processing Systems.Red Hook,USA:Curran Associates Inc.,2012:1097-1105.
［12］LIU X,KAN M,WU W.et al.VIPLFaceNet:an open source deep face recognition SDK［J］.Frontiers of Computer Science,2017,11(2):208-218.
［13］BEIGY H,MEYBODI M R.Adaptation of parameters of BP algorithm using learning automata［C］//Proceedings of Brazilian Symposium on Neural Networks.Washington D.C.,USA:IEEE Press,2000:24-31.
［14］KLINE D M,BERARDI V L.Revisiting squared-error and cross-entropy functions for training neural network classifiers［J］.Neural Computing and Applications,2005,14(4):310-318.
［15］赵立辉,毛竹,霍春宝,等.基于GMM-SVM的说话人识别系统研究［J］.工矿自动化,2014,40(5):49-53.
［16］周国鑫,高勇.基于GMM-UBM模型的说话人辨识研究［J］.无线电工程,2014,44(12):14-17.

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	刘志浩, 孟凡云, 王金鹤, 张楠. 基于空洞卷积与注意力模块的立体匹配算法[J]. 计算机工程, 2023, 49(8): 223-231.
[3]	刘金硕, 王代辰, 邓娟, 王丽娜. 基于长尾分类算法的网络不良信息分类[J]. 计算机工程, 2023, 49(8): 13-19, 28.
[4]	刘昊鑫, 董超, 勾智楠, 高凯. 融合混合表征的小样本关系抽取方法[J]. 计算机工程, 2023, 49(8): 63-68.
[5]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[6]	王可铮, 徐玉芬, 周尚波. 结合对比感知损失和融合注意力的图像去雾模型[J]. 计算机工程, 2023, 49(8): 207-214.
[7]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[8]	李军侠, 王星驰, 殷梓, 石德硕. 边缘深度挖掘的弱监督显著性目标检测[J]. 计算机工程, 2023, 49(7): 169-178.
[9]	吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.
[10]	席建锐, 唐红梅, 梁春阳, 刘鑫. 基于改进隐函数的点云物体重建[J]. 计算机工程, 2023, 49(7): 214-222.
[11]	齐咏生, 杜晓旭, 朱俊峰, 高胜利, 刘利强. 基于增强型轻量深度网络的牧区牲畜高效检测[J]. 计算机工程, 2023, 49(7): 278-287.
[12]	闫兴亚, 匡娅茜, 白光睿, 李月. 基于深度学习的学生课堂行为识别方法[J]. 计算机工程, 2023, 49(7): 251-258.
[13]	张博旭, 蒲智, 程曦. 基于提示学习的维吾尔语文本分类研究[J]. 计算机工程, 2023, 49(6): 292-299,313.
[14]	于海洋, 景鹏, 张文涛, 谢赛飞, 滑志华, 宋草原. 基于残差与注意力机制的道路裂缝检测U-Net改进模型[J]. 计算机工程, 2023, 49(6): 265-273.
[15]	谌雨章, 黄逸姿, 张钧涵. 基于多速率空洞卷积的多尺度水下小目标检测[J]. 计算机工程, 2023, 49(6): 257-264.

选择文件类型/文献管理软件名称

选择包含的内容

基于深度学习的小样本声纹识别方法

Small Sample Voiceprint Recognition Method Based on Deep Learning

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于深度学习的小样本声纹识别方法

Small Sample Voiceprint Recognition Method Based on Deep Learning

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价