作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 多媒体技术及应用 • 上一篇    下一篇

基于语音识别与特征的无监督语音模式提取

张 震,赵庆卫,颜永红   

  1. (中国科学院语言声学与内容理解重点实验室,北京 100190)
  • 收稿日期:2013-05-02 出版日期:2014-05-15 发布日期:2014-05-14
  • 作者简介:张 震(1984-),男,博士研究生,主研方向:语音识别,关键词检索;赵庆卫、颜永红,研究员、博士生导师。
  • 基金资助:
    国家自然科学基金资助项目(10925419, 90920302, 61072124, 11074275, 11161140319, 91120001, 61271426);国家“863”计划基金资助项目(2012AA012503);中国科学院重点部署基金资助项目(KGZD-EW-103-2);中国科学院战略性先导科技专项基金资助项目“面向感知中国的新一代信息技术研究”(XDA06030100, XDA06030500)。

Unsupervised Speech Pattern Extraction Based on Speech Recognition and Feature

ZHANG Zhen, ZHAO Qing-wei, YAN Yong-hong   

  1. (Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences, Beijing 100190, China)
  • Received:2013-05-02 Online:2014-05-15 Published:2014-05-14

摘要: 在语音识别与特征系统中,通过无监督的方法搜索未知语音流中出现的语言模式。利用语音识别系统的多候选结果,通过分段动态时间弯曲算法进行语言模式的搜索,采用有效的聚类算法以及置信度估计算法,提高系统性能,同时建立仅基于特征匹配的相似音频片段检测系统,不使用任何知识源,仅从语音中获取重复的语音模式,在广播电视新闻与自然口语对话2个测试集上对比2个系统的性能。实验结果表明,基于识别的系统具有较好的检测效果,而基于特征的系统具备多语种的推广性。

关键词: 语音识别, 语音模式发现, 分段动态时间弯曲算法, 图聚类算法, 音素回环后验概率计算

Abstract: This paper proposes the unsupervised method based on both speech recognition system and feature-based system to search for the speech patterns. In speech recognition system, the alternative results of the speech recognition system decoder are used to search audio patterns with segmental dynamic time warping algorithm. Then graph clustering algorithm is used, as well as confidence estimation algorithm, to improve the performance of the system. It also proposes the system based on feature only without any knowledge resource. In the final, the performances of the two systems on both radio and television news and spoken dialogue sets are compared. The speech recognition system achieves better performance, and the feature based system can be used on many languages.

Key words: speech recognition, speech pattern discovery, segmental dynamic time warping algorithm, graph clustering algorithm, phoneme loop calculation of posterior probability

中图分类号: