Primi Speech Recognition Based on Kaldi

Abstract

Abstract: In order to improve the performance of Primi speech recognition system,the deep learning model is introduced into Primi speech recognition.The deep learning model is a large capacity and complex network model.Kaldi speech recognition toolkit is used as an experimental platform and five different acoustic models are respectively trained which contain a deep neural network model with four hidden layers.By comparing the speech recognition rates obtained by different acoustic models,it is found that the G-DNN model improves the accuracy of speech recognition by 49.8% than the Monophone model.Experimental results show that the Primi speech recognition rate based on the deep learning model can be improved,when the number of Primi speech corpus in the training set is increased.And the robustness of the Primi speech recognition system based on deep learning is stronger than the other four acoustic models.

Key words: Primi, deep learning, Kaldi speech recognition toolkit, speech recognition, robustness

摘要： 为提高普米语语音识别系统的性能,引入深度学习模型进行普米语语音识别,该模型是一个高容量复杂的网络模型。以Kaldi语音识别工具包为实验平台,分别训练5种不同的声学模型,且这5种模型中包含一个有4隐层的深度神经网络模型。比较不同声学模型得到的语音识别率发现,G-DNN模型比Monophone模型的语音识别率平均提升49.8%。实验结果表明,当增加训练集的普米语语音语料量时,基于深度学习的普米语语音识别率会提升,而基于深度学习的普米语语音识别系统的鲁棒性比其余4个声学模型的普米语语音识别系统的鲁棒性更强。

关键词: 普米语, 深度学习, Kaldi语音识别工具包, 语音识别, 鲁棒性

CLC Number:

TP18

HU Wenjun,FU Meijun,PAN Wenlin. Primi Speech Recognition Based on Kaldi[J]. Computer Engineering.

胡文君,傅美君,潘文林. 基于Kaldi的普米语语音识别[J]. 计算机工程.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/Y2018/V44/I1/199

References

参考文献［1］陆惠云.云南省七个“特少”民族语言使用状况调查［J］.玉溪师范学院学报,2014,30(1):45-59. ［2］解鲁云.国内普米族研究综述［J］.云南民族学院学报(哲学社会科学版),2003,20(1):75-78. ［3］李余芳,苏洁,胡文君,等.基于HTK的普米语孤立词的语音识别［J］.云南民族大学学报(自然科学版),2015,24(5):426-430. ［4］苏洁.基于HTK的普米语孤立词识别研究［D］.昆明:云南民族大学,2016. ［5］郭琳,苏洁,李余芳,等.一种人机交互语音切分系统［J］.云南民族大学学报(自然科学版).2016,25(1):87-91. ［6］苏洁,李余芳,郭琳,等.HTK参数对普米语孤立词识别率的影响［J］.云南民族大学学报(自然科学版),2015,24(6):510-513. ［7］李余芳.基于HTK的带噪普米语音识别系统的鲁棒性研究［D］.昆明:云南民族大学,2016. ［8］HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets［J］.Neural Computation,2006,18(7):1527-1554. ［9］AWNI H,CARL C,JARED C,et al.Deep Speech:Scaling up End-to-End Speech Recognition［EB/OL］.(2014-10-19).https://arxiv.org/pdf/1412.5567v2.pdf. ［10］XIONG W,DROPPO J,HUANG Xuedong,et al.Achieving Human Parity in Conversational Speech Recognition［EB/OL］.(2016-10-17).https://arxiv.org/abs/1610.05256. ［11］科大讯飞.探索语音识别技术的前世今生［J］.科技导报,2016,36(9):76-77. ［12］袁胜龙,郭武,戴礼荣.基于深层神经网络的藏语识别［J］.模式识别与人工智能,2015,28(3):209-213. ［13］其米克·巴特西,黄浩,王羡慧.基于深度神经网络的维吾尔语语音识别［J］.计算机工程与设计,2015(8):2239-2244. ［14］ZHANG Hui,BAO Feilong,GAO Guanglai.Mongolian Speech Recognition Based on Deep Neural Networks［M］// SUN Maosong,LIU Zhiyuan,ZHANG Min.Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.Berlin,Germany:Springer,2015. ［15］HU Wenjun,FU Meijun,PAN Wenlin.Primi Speech Recognition Based on Deep Neural Network［C］//Proceedings of IEEE International Conference on Intelligent Systems.Washington D.C.,USA:IEEE Press,2016:667-671. ［16］陆绍尊.普米语简志［M］.北京:民族出版社,1983. 编辑顾逸斐

[1]	DU Chenyang, ZHANG Xueying, HUANG Lixia, LI Juan. Multi-Feature Speech Emotion Recognition Based on Improved Efficient Channel Attention Mechanism [J]. Computer Engineering, 2025, 51(4): 97-106.
[2]	SHEN Chen, HE Yong, PENG Anlang. Robust Internet of Things Multidimensional Time Series Data Prediction Method [J]. Computer Engineering, 2025, 51(4): 107-118.
[3]	JIANG Jieping, WANG Mingwen. Residual Behavior Recognition Model Based on Spatio-Temporal Shuffle Attention Mechanism [J]. Computer Engineering, 2025, 51(4): 119-128.
[4]	DAI Kangjia, XU Huiying, ZHU Xinzhong, LI Xiyu, HUANG Xiao, CHEN Guoqiang, ZHANG Zhixiong. YGL-SLAM: Point and Line Based Semantic SLAM System for Dynamic Scenes [J]. Computer Engineering, 2025, 51(3): 95-104.
[5]	HAN Peng, HUANG Yunzhi, REN Caiyue, CHENG Jingyi, XU Jun. Assessment of Neoadjuvant Chemotherapy Efficacy in Breast Cancer Using Dual-Branch Network with PET Imaging [J]. Computer Engineering, 2025, 51(3): 293-299.
[6]	HU Chaoju, GUO Fengyi. MODF Port State Detection Algorithm Based on Improved YOLOv7 [J]. Computer Engineering, 2025, 51(2): 78-85.
[7]	SUN Haomiao, LI Zongmin, XIAO Qian, SUN Wenjie, ZHANG Wenxin. AI-Curling: An On-Site Curling Analysis and Decision-Making Method [J]. Computer Engineering, 2025, 51(2): 102-110.
[8]	SUN Yikang, GAO Jianhua. Dead Code Detection Method Based on Convolutional Neural Network and Long Short-Term Memory [J]. Computer Engineering, 2025, 51(2): 223-237.
[9]	ZHAO Hong, SONG Furong, LI Wengai. Research on Image Adversarial Example Generation Method Based on SE-AdvGAN [J]. Computer Engineering, 2025, 51(2): 300-311.
[10]	ZHOU Yu, XIE Wei, Kwong Tak Wu, JIANG Jianmin. Reconstruction of Video Snapshot Compressive Imaging Based on Triple Self-Attention [J]. Computer Engineering, 2025, 51(1): 20-30.
[11]	HU Shenglong, CHEN Bin, ZHANG Kaihua, SONG Huihui. Co-Saliency Object Detection Enhanced by Scene Structure Knowledge [J]. Computer Engineering, 2025, 51(1): 31-41.
[12]	LIN Shuobin, CAI Jieyi, FANG Xiaocheng, ZHANG Zheng, LU Guangming, CHEN Bingzhi. Adversarial Robust Distillation Method Based on Intensity Correlation Regularization Learning [J]. Computer Engineering, 2025, 51(1): 42-50.
[13]	YU Yongtao, SUN Ao, LI Ang, ZHU Linlin. Optimization Method for Classifier Output Repeatability Based on Siamese Networks [J]. Computer Engineering, 2025, 51(1): 118-127.
[14]	ZHANG Huiying, SHENG Wenshun. Improved Algorithm for Facial Age Recognition Based on Label Adaptation [J]. Computer Engineering, 2025, 51(1): 174-181.
[15]	YANG Hongju, JI Chang. Research on Learning-Driven Image Compression Algorithm [J]. Computer Engineering, 2025, 51(1): 190-197.

Please choose a citation manager

Content to export