作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2019, Vol. 45 ›› Issue (6): 249-253,266. doi: 10.19678/j.issn.1000-3428.0051065

• 人工智能及识别技术 • 上一篇    下一篇

基于CTC准则的普通话识别及改进

张立民1,王彦哲1,张兵强1,朱念斌2   

  1. 1.海军航空大学 信息融合研究所,山东 烟台 264000; 2.中国人民解放军61923部队,北京 100000
  • 收稿日期:2018-04-03 出版日期:2019-06-15 发布日期:2019-06-15
  • 作者简介:张立民(1966—),男,教授、博士生导师,主研方向为人工智能、电子系统仿真;王彦哲(通信作者),硕士研究生;张兵强,副教授、博士;朱念斌,高级工程师、硕士。
  • 基金资助:
    国家自然科学基金重大研究计划(91538201);泰山学者工程专项经费(ts201511020)。

Mandarin recognition and improvement based on CTC criterion

ZHANG Limin1,WANG Yanzhe1,ZHANG Bingqiang1,ZHU Nianbin2   

  1. 1.Institute of Information Fusion,Naval Aeronautical University,Yantai,Shandong 264000,China;2.Troops 61923 of PLA,Beijing 100000,China
  • Received:2018-04-03 Online:2019-06-15 Published:2019-06-15

摘要: 主流神经网络训练的交叉熵准则针对声学数据的每个帧进行分类优化,而连续语音识别需以序列级的转录准确性为性能度量指标。针对这一差异,构建一种基于序列级转录的端到端语音识别系统。以音素为基本单元建模,并采用连接时序分类(CTC)的目标函数改进长短时记忆网络的结构。在解码过程中引入词典和语言模型,并在前端增加音调特征以丰富声学特征。利用序列区分度训练技术提升CTC模型的建模效果。实验结果表明,该系统的识别效率和识别准确率得到提高,词错误率最低可降至19.09%±0.16%。

关键词: 序列级, 端到端, 解码, 声学特征, 区分度训练

Abstract: The cross-entropy criterion of mainstream neural network training classifies and optimizes each frame of acoustic data, while the continuous speech recognition uses the sequence-level transcription accuracy as the performance measurement.For this difference,an end-to-end speech recognition system based on sequence-level transcription is constructed.The phoneme is used as the basic unit to build the model,and the target function of Connectionist Temporal Classification (CTC) is used to improve the structure of Long Short-Term Memory(LSTM) network.The dictionary and language model are introduced in the decoding process,and the tone feature is added to the front end to enrich the acoustic feature.The modeling effect of CTC is improved by using the sequence discrimination training technique.Experimental results show that the recognition efficiency and accuracy of the proposed system is improved,and the Word Error Rate(WER) can be as low as 19.09%±0.16%.

Key words: sequence level, end-to-end, decode, acoustic feature, discrimination training

中图分类号: