作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 开发研究与工程应用 • 上一篇    

基于计算听觉场景分析的说话人转换检测

杨登舟 1,2,刘加 3,夏善红 1   

  1. (1.中国科学院电子学研究所,北京 100190; 2.中国科学院大学,北京 100049;3.清华大学 电子工程系,北京 100084)
  • 收稿日期:2017-02-14 出版日期:2018-02-15 发布日期:2018-02-25
  • 作者简介:杨登舟(1986—),男,博士研究生,主研方向为说话人识别;刘加,教授、博士;夏善红,研究员、博士。
  • 基金资助:
    国家自然科学基金“噪声和短语音条件下的说话人识别”(61370034)。

Speaker Change Detection Based on Computational Auditory Scene Analysis

YANG Dengzhou  1,2,LIU Jia  3,XIA Shanhong  1   

  1. (1.Institute of Electronics,Chinese Academy of Sciences,Beijing 100190,China;2.University of Chinese Academy of Sciences,Beijing 100049,China;3 Department of Electronic Engineering,Tsinghua University,Beijing 100084,China)
  • Received:2017-02-14 Online:2018-02-15 Published:2018-02-25

摘要: 在短时语音说话人快速转变的说话人转换检测中,用于训练说话人模型的连续语音较短导致模型不稳健,致使说话人转换检测的性能较差。为此,提出一种新的说话人转换检测方法。借鉴人耳听觉处理机制将语音信号分解为多个子带,可以得到准确的浊、清音边界,实现对零散清、浊音子段的拼接。利用贝叶斯信息准则判决语音子段间的疑似转换点,并运用音高特征做区间验证。实验结果表明,该方法在平均语音子段时长为1.34 s的极短语音条件下,可使说话人转换检测的等错率降至23.2%,F1值达到70%。

关键词: 说话人转换检测, 计算听觉场景分析, 伽马通能量倒谱系数, 音高, 贝叶斯信息准则

Abstract: In Speaker Change Detection(SCD) of rapid conversion condition with short speech segment,speaker models training from deficient speech frames of a speaker are not rubust enough,and SCD performance is less satisfied.Therefore,a new SCD method based on Computational Auditory Scene Analysis(CASA) is proposed.The speech signal is decomposed into a number of narrow sub-band signals owing to the auditory processing mechamism of human ears.Accurate voiced speech and unvoiced speech boundaries are obtained,voice sub-segments is spliced from scattered voice and unvoiced sub-segments.Speaker change points are determined between the speaker voice sub-segments by Bayesian Information Criterion(BIC),pitch features extracted from voiced portion are used to verify region.Experimental results show that Equal Error Rate(EER) of SCD can be reduced to 23.2%,which corresponding to 70% of the F1-value,in the rapid conversion situation of average 1.34 s speech sub-segment.

Key words: Speaker Change Detection(SCD), Computational Auditory Scene Analysis(CASA), Gammatone Energy Cepstral Coefficients(GECC), pitch, Bayesian Information Criterion(BCI)

中图分类号: