基于语音识别与特征的无监督语音模式提取

doi:10.3969/j.issn.1000-3428.2014.05.054

计算机工程

基于语音识别与特征的无监督语音模式提取

张震，赵庆卫，颜永红

(中国科学院语言声学与内容理解重点实验室，北京 100190)

收稿日期:2013-05-02 出版日期:2014-05-15 发布日期:2014-05-14
作者简介:张震(1984－)，男，博士研究生，主研方向：语音识别，关键词检索；赵庆卫、颜永红，研究员、博士生导师。
基金资助:
国家自然科学基金资助项目(10925419, 90920302, 61072124, 11074275, 11161140319, 91120001, 61271426)；国家“863”计划基金资助项目(2012AA012503)；中国科学院重点部署基金资助项目(KGZD-EW-103-2)；中国科学院战略性先导科技专项基金资助项目“面向感知中国的新一代信息技术研究”(XDA06030100, XDA06030500)。

Unsupervised Speech Pattern Extraction Based on Speech Recognition and Feature

ZHANG Zhen, ZHAO Qing-wei, YAN Yong-hong

(Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences, Beijing 100190, China)

Received:2013-05-02 Online:2014-05-15 Published:2014-05-14

摘要/Abstract

摘要： 在语音识别与特征系统中，通过无监督的方法搜索未知语音流中出现的语言模式。利用语音识别系统的多候选结果，通过分段动态时间弯曲算法进行语言模式的搜索，采用有效的聚类算法以及置信度估计算法，提高系统性能，同时建立仅基于特征匹配的相似音频片段检测系统，不使用任何知识源，仅从语音中获取重复的语音模式，在广播电视新闻与自然口语对话2个测试集上对比2个系统的性能。实验结果表明，基于识别的系统具有较好的检测效果，而基于特征的系统具备多语种的推广性。

关键词: 语音识别, 语音模式发现, 分段动态时间弯曲算法, 图聚类算法, 音素回环后验概率计算

Abstract: This paper proposes the unsupervised method based on both speech recognition system and feature-based system to search for the speech patterns. In speech recognition system, the alternative results of the speech recognition system decoder are used to search audio patterns with segmental dynamic time warping algorithm. Then graph clustering algorithm is used, as well as confidence estimation algorithm, to improve the performance of the system. It also proposes the system based on feature only without any knowledge resource. In the final, the performances of the two systems on both radio and television news and spoken dialogue sets are compared. The speech recognition system achieves better performance, and the feature based system can be used on many languages.

Key words: speech recognition, speech pattern discovery, segmental dynamic time warping algorithm, graph clustering algorithm, phoneme loop calculation of posterior probability

中图分类号:

TN912.34

张震，赵庆卫，颜永红. 基于语音识别与特征的无监督语音模式提取[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2014.05.054.

ZHANG Zhen, ZHAO Qing-wei, YAN Yong-hong. Unsupervised Speech Pattern Extraction Based on Speech Recognition and Feature[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2014.05.054.

http://www.ecice06.com/CN/Y2014/V40/I5/262

参考文献

参考文献 [1] 刘加, 潘胜昔. 用TMS320C31实时实现电话语音识别系统[J]. 清华大学学报: 自然科学版, 1998, 38(z1): 51-54. [2] 韩疆, 刘晓星, 颜永红, 等. 一种任务域无关的语音关键词检测系统[J]. 通信学报, 2006, 27(2): 137-141. [3] Park A S. Unsupervised Pattern Discovery in Speech[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1): 186-197. [4] Shen Wade, White C M, Hazen T J. A Comparison of Query- by-Example Methods for Spoken Term Detection[C]//Proc. of Interspeech’09. Brighton, UK: [s. n.], 2009: 421-426. [5] Rigoutsos I, Floratos A. Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm[J]. Bio- informatics, 1998, 14(1): 55-67. [6] Roy D K. Learning Words from Sights and Sounds: A Comput- ational Model[J]. Cognitive Science, 2002, 26(1): 113-146. [7] Brent M R. An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery[J]. Machine Learning, 1999, 34(1/3): 71-105. [8] Ng A Y, Jordan M I. On Spectral Clustering: Analysis and an Algorithm[C]//Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002: 849-856. [9] 刘镜, 刘加. 置信度的原理及其在语音识别中的应用[J]. 计算机研究与发展, 2000, 37(7): 882-890. [10] Christiansen R, Rushforth C. Detecting and Locating Key Words in Continuous Speech Using Linear Predictive Coding[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1977, 25(5): 361-367. [11] Newman M E J. Finding and Evaluating Community Structure in Networks[J]. Physical Review E, 2004, 69(2). [12] Sun Yanqing, Zhao Qingwei. Combining Phoneme Loop Posteriori with Decoding Posteriori as Confidence Measure for Speech Recognition in E-service[C]// Proc. of International Conference on e-Education, e-Business, e-Management, and e-Learning. [S. l.]: IEEE Press, 2010: 238-241. [13] Gao Jie, Zhao Qingwei, Yan Yonghong, et al. Efficient System Combination for Syllable-confusion-network-based Chinese Spoken Term Detection[C]//Proc. of the 6th International Symposium on Chinese Spoken Language Processing. Kunming, China: [s. n.], 2008: 366-369. 编辑顾逸斐

[1]	李宜亭, 屈丹, 杨绪魁, 张昊, 沈小龙. 基于分解门控注意力单元的高效Conformer模型[J]. 计算机工程, 2023, 49(5): 73-80.
[2]	柏财通, 崔翛龙, 李爱. 基于本地蒸馏联邦学习的鲁棒语音识别技术[J]. 计算机工程, 2022, 48(10): 103-109.
[3]	柏财通, 高志强, 李爱, 崔翛龙. 基于门控网络的军事装备控制指令语音识别研究[J]. 计算机工程, 2021, 47(7): 301-306.
[4]	王俊超,黄浩,徐海华,胡英. 基于迁移学习的低资源度维吾尔语语音识别[J]. 计算机工程, 2018, 44(10): 281-285,291.
[5]	胡文君,傅美君,潘文林. 基于Kaldi的普米语语音识别[J]. 计算机工程, 2018, 44(1): 199-205.
[6]	张乐,张雪英,孙颖,张卫. 基于聚合经验模态分解的情感语音特征提取[J]. 计算机工程, 2017, 43(8): 306-309,315.
[7]	项秉伟,景新幸,杨海燕. 基于噪声分类与补偿的车载语音识别[J]. 计算机工程, 2017, 43(3): 220-224.
[8]	商雄伟,张志祥,邱舒婷. 一种通用的限定领域智能语音导学系统设计方法[J]. 计算机工程, 2016, 42(6): 299-304.
[9]	赵彩光,张树群,雷兆宜. 基于改进对比散度的GRBM 语音识别[J]. 计算机工程, 2015, 41(5): 213-218.
[10]	鲜晓东,吕建中,樊宇星. 基于密度与距离参数的CHMM声学模型初值估计[J]. 计算机工程, 2015, 41(10): 318-321.
[11]	袁浩, 李海洋, 郑铁然, 韩纪庆. 基于相邻帧特征相似性的快速关键词检出方法[J]. 计算机工程, 2012, 38(7): 287-289.
[12]	李冠宇, 孟猛. 藏语拉萨话大词表连续语音识别声学模型研究[J]. 计算机工程, 2012, 38(5): 189-191.
[13]	秦春香, 黄浩. 发音特征在维汉语音识别中的应用[J]. 计算机工程, 2012, 38(23): 177-180.
[14]	向河林, 张明西, 李珀瀚, 何震瀛, 汪卫. 一种基于聚类的语义检索算法[J]. 计算机工程, 2012, 38(2): 36-38.
[15]	陆明明, 张连海, 屈丹, 牛铜. 一种融合音位属性的语音文档索引方法[J]. 计算机工程, 2012, 38(19): 159-162.

选择文件类型/文献管理软件名称

选择包含的内容

基于语音识别与特征的无监督语音模式提取

Unsupervised Speech Pattern Extraction Based on Speech Recognition and Feature

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于语音识别与特征的无监督语音模式提取

Unsupervised Speech Pattern Extraction Based on Speech Recognition and Feature

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价