作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 多媒体技术及应用 • 上一篇    下一篇

基于词级DPPM的连续语音关键词检测

王 勇,张连海   

  1. (解放军信息工程大学信息系统工程学院,郑州 450002)
  • 收稿日期:2013-03-05 出版日期:2014-05-15 发布日期:2014-05-14
  • 作者简介:王 勇(1987-),男,硕士研究生,主研方向:连续语音关键词检测;张连海,副教授。

Continuous Speech Keyword Detection Based on Word Level Discriminative Point Process Model

WANG Yong, ZHANG Lian-hai   

  1. (School of Information System Engineering, PLA Information Engineering University, Zhengzhou 450002, China)
  • Received:2013-03-05 Online:2014-05-15 Published:2014-05-14

摘要: 提出一种基于词级区分性点过程模型的连续语音关键词检测方法。利用时间模式结构和多层感知器计算每个音素帧级后验概率,使用区分性点过程模型将一段时间内多个音素事件形成的点过程作为整体,把关键词检测看作二元分类问题,经分段和拼接构成超矢量,输入支持向量机分类器,判断该段语音是否为待检测关键词。该方法充分考虑语音信号上下文相关性,直接以词作为基本单元建模,提高了系统检测的准确性和鲁棒性。实验结果表明,对采样的语音,其关键词平均召回率和准确率分别可达71.5%和84.6%以上,并且结合相关语言模型知识,系统性能将会进一步提高。

关键词: 点过程模型, 音素后验概率, 时间模式, 关键词检测, 支持向量机, 区分性点过程模型

Abstract: This paper proposes a keyword detection method based on word level Discriminative Point Process Model(DPPM) in continuous speech. It computes frame-level phone posterior probability using temporal pattern and multilayer perception. DPPM sees point process produced by phone events of the duration as a whole. Then input Support Vector Machine(SVM) with super vector formed by segmenting and jointing the point process representation, so can distinguish whether the point process is produced by the keyword. Due to long range context dependencies, it is reasonable to expect that directly modeling entire words may permit a more accurate and robust decoding of the speech signal. Experimental results show that for speech, the average recall and precision rate of keywords are above 71.5% and 84.6%, and improves significantly with language model.

Key words: Point Process Model(PPM), phoneme posterior probability, time mode, keyword detection, Support Vector Machine(SVM), Discriminative Point Process Model(DPPM)

中图分类号: