基于语音识别和语速修改的语音复读系统

doi:10.3969/j.issn.1000-3428.2011.05.098

计算机工程 ›› 2011, Vol. 37 ›› Issue (5): 288-290. doi: 10.3969/j.issn.1000-3428.2011.05.098

基于语音识别和语速修改的语音复读系统

梁青青，杨鸿武，郭威彤，裴东

(西北师范大学物理与电子工程学院，兰州 730070)

出版日期:2011-03-05 发布日期:2012-10-31
作者简介:梁青青(1983－)，女，硕士研究生，主研方向：语音识别；杨鸿武(通信作者)，教授、博士；郭威彤，助理实验师、硕士；裴东，副教授
基金资助:
国家自然科学基金资助面上项目(60875015)；教育部科学研究重点基金资助项目(208146)

Speech Repeating System Based on Speech Recognition and Speaking Rate Modification

LIANG Qing-qing, YANG Hong-wu, GUO Wei-tong, PEI Dong

(College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou 730070, China)

Online:2011-03-05 Published:2012-10-31

摘要/Abstract

摘要： 针对英语学习中的听力练习问题，利用语速修改算法和大词表连续语音识别算法，实现一个面向英语学习的语速可变、字幕同步的数字复读系统，根据字幕选择相应的语音进行复读，并实时调整语速。MOS评测结果表明，系统调节语速后的语音平均MOS得分为4.1，接近原始语音质量。语音识别结果显示，系统对英语听力材料中纯净语音的识别率达到70.8%，能够满足英语听力学习的需要。

关键词: 语速调节, 语音识别, 字幕同步, 语音复读

Abstract: Aiming at the problem of listening exercise in English learning, this paper proposes a speech repeating system which adjusts the speaking rate with TD-PSOLA algorithm and can display the subtitle of speech by large vocabulary connected speech recognition method. With the system, users can select the speech to repeat by selecting subtitle and modify speaking rate of selected speech real-time. The modified speech by system achieve 4.1 of the average Mean Opinion Score(MOS), which is close to the quality of the original voice. Result of speech recognition evaluation shows that the word level accuracy of speech recognition on pure English learning material is 70.8%.

Key words: speaking rate modification, speech recognition, subtitle synchronization, speech repeating

中图分类号:

TP391

梁青青, 杨鸿武, 郭威彤, 裴东. 基于语音识别和语速修改的语音复读系统[J]. 计算机工程, 2011, 37(5): 288-290.

LIANG Jing-Jing, YANG Hong-Wu, GUO Wei-Tong, FEI Dong. Speech Repeating System Based on Speech Recognition and Speaking Rate Modification[J]. Computer Engineering, 2011, 37(5): 288-290.

http://www.ecice06.com/CN/Y2011/V37/I5/288

参考文献

[1] 田靓. 言语速度对留学生听力理解的影响[J]. 汉语学习, 2006, (4): 71-75. [2] 陈玉平, 韩纪庆, 郑铁然. 基于动态排位信息的语音关键词确认方法[J]. 计算机工程, 2008, 34(10): 161-162. [3] Neri A, Cucchiarini C, Strik H. ASR-based Corrective Feedback on Pronunciation: Does It Really Work?[C]//Proc. of Interspeech. Pittsburg, USA: [s. n.], 2006. [4] 李晨冲, 董滨, 潘复平, 等. 汉语普通话易混淆音素的识别[J]. 计算机工程, 2009, 35(23): 201-203. [5] Harrison A M, Lo W K, Qiang Xiaojun, et al. Implementation of an Extended Recognition Network for Mispronunciation Detection and Diagnosis in Computer-assisted Pronunciation Training[C]// Proc. of the 2nd ISCA Workshop on Speech and Language Techno- logy in Education. Warwickshire, UK: [s. n.], 2009. [6] Kim J M, Wang Chao, Peabody M, et al. An Interactive English Pronunciation Dictionary for Korean Learners[C]//Proc. of Interspeech. Jeju Island, Korea: [s. n.], 2004. [7] Moulines E, Charpentier F. Pitch-synchronous Waveform Proce- ssing Techniques for Text-to-speech Synthesis Using Diphones[J]. Speech Communication, 1990, 9(5/6): 453-467. [8] Wessel F, Ney H. Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition[J]. IEEE Trans. on Speech and Audio Processing, 2005, 13(1): 23-31. [9] Kawahara H, Estill J, Fujimura O. Aperiodicity Extraction and Control Using Mixed Mode Excitation and Group Delay Manipulation for a High Quality Speech Analysis, Modification and Synthesis System Straight[C]//Proc. of International Workshop on Models and Analysis of Vocal Emissions for Biomedical Application. Firentze, Italy: [s. n.], 2001. [10] Lawrence R, Rabiner A. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[J]. Proceedings of the IEEE, 1989, 77(2): 257-286. [11] Young S, Evermann G, Gales M, et al. The HTK Book(for HTK Version 3.4)[Z]. Cambridge, UK: Cambridge University Engineering Department, 2006.

[1]	李宜亭, 屈丹, 杨绪魁, 张昊, 沈小龙. 基于分解门控注意力单元的高效Conformer模型[J]. 计算机工程, 2023, 49(5): 73-80.
[2]	柏财通, 崔翛龙, 李爱. 基于本地蒸馏联邦学习的鲁棒语音识别技术[J]. 计算机工程, 2022, 48(10): 103-109.
[3]	柏财通, 高志强, 李爱, 崔翛龙. 基于门控网络的军事装备控制指令语音识别研究[J]. 计算机工程, 2021, 47(7): 301-306.
[4]	王俊超,黄浩,徐海华,胡英. 基于迁移学习的低资源度维吾尔语语音识别[J]. 计算机工程, 2018, 44(10): 281-285,291.
[5]	胡文君,傅美君,潘文林. 基于Kaldi的普米语语音识别[J]. 计算机工程, 2018, 44(1): 199-205.
[6]	张乐,张雪英,孙颖,张卫. 基于聚合经验模态分解的情感语音特征提取[J]. 计算机工程, 2017, 43(8): 306-309,315.
[7]	项秉伟,景新幸,杨海燕. 基于噪声分类与补偿的车载语音识别[J]. 计算机工程, 2017, 43(3): 220-224.
[8]	商雄伟,张志祥,邱舒婷. 一种通用的限定领域智能语音导学系统设计方法[J]. 计算机工程, 2016, 42(6): 299-304.
[9]	赵彩光,张树群,雷兆宜. 基于改进对比散度的GRBM 语音识别[J]. 计算机工程, 2015, 41(5): 213-218.
[10]	鲜晓东,吕建中,樊宇星. 基于密度与距离参数的CHMM声学模型初值估计[J]. 计算机工程, 2015, 41(10): 318-321.
[11]	张震，赵庆卫，颜永红. 基于语音识别与特征的无监督语音模式提取[J]. 计算机工程, 2014, 40(5): 262-265.
[12]	袁浩, 李海洋, 郑铁然, 韩纪庆. 基于相邻帧特征相似性的快速关键词检出方法[J]. 计算机工程, 2012, 38(7): 287-289.
[13]	李冠宇, 孟猛. 藏语拉萨话大词表连续语音识别声学模型研究[J]. 计算机工程, 2012, 38(5): 189-191.
[14]	秦春香, 黄浩. 发音特征在维汉语音识别中的应用[J]. 计算机工程, 2012, 38(23): 177-180.
[15]	陆明明, 张连海, 屈丹, 牛铜. 一种融合音位属性的语音文档索引方法[J]. 计算机工程, 2012, 38(19): 159-162.

选择文件类型/文献管理软件名称

选择包含的内容

基于语音识别和语速修改的语音复读系统

Speech Repeating System Based on Speech Recognition and Speaking Rate Modification

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于语音识别和语速修改的语音复读系统

Speech Repeating System Based on Speech Recognition and Speaking Rate Modification

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价