基于计算听觉场景分析的说话人转换检测

计算机工程

• 开发研究与工程应用 • 上一篇

基于计算听觉场景分析的说话人转换检测

杨登舟 ^1,2,刘加 ³,夏善红 ¹

(1.中国科学院电子学研究所,北京 100190; 2.中国科学院大学,北京 100049;3.清华大学电子工程系,北京 100084)

收稿日期:2017-02-14 出版日期:2018-02-15 发布日期:2018-02-25
作者简介:杨登舟(1986—),男,博士研究生,主研方向为说话人识别;刘加,教授、博士;夏善红,研究员、博士。
基金资助:
国家自然科学基金“噪声和短语音条件下的说话人识别”(61370034)。

Speaker Change Detection Based on Computational Auditory Scene Analysis

YANG Dengzhou ^1,2,LIU Jia ³,XIA Shanhong ¹

(1.Institute of Electronics,Chinese Academy of Sciences,Beijing 100190,China;2.University of Chinese Academy of Sciences,Beijing 100049,China;3 Department of Electronic Engineering,Tsinghua University,Beijing 100084,China)

Received:2017-02-14 Online:2018-02-15 Published:2018-02-25

摘要/Abstract

摘要： 在短时语音说话人快速转变的说话人转换检测中,用于训练说话人模型的连续语音较短导致模型不稳健,致使说话人转换检测的性能较差。为此,提出一种新的说话人转换检测方法。借鉴人耳听觉处理机制将语音信号分解为多个子带,可以得到准确的浊、清音边界,实现对零散清、浊音子段的拼接。利用贝叶斯信息准则判决语音子段间的疑似转换点,并运用音高特征做区间验证。实验结果表明,该方法在平均语音子段时长为1.34 s的极短语音条件下,可使说话人转换检测的等错率降至23.2%,F1值达到70%。

关键词: 说话人转换检测, 计算听觉场景分析, 伽马通能量倒谱系数, 音高, 贝叶斯信息准则

Abstract: In Speaker Change Detection(SCD) of rapid conversion condition with short speech segment,speaker models training from deficient speech frames of a speaker are not rubust enough,and SCD performance is less satisfied.Therefore,a new SCD method based on Computational Auditory Scene Analysis(CASA) is proposed.The speech signal is decomposed into a number of narrow sub-band signals owing to the auditory processing mechamism of human ears.Accurate voiced speech and unvoiced speech boundaries are obtained,voice sub-segments is spliced from scattered voice and unvoiced sub-segments.Speaker change points are determined between the speaker voice sub-segments by Bayesian Information Criterion(BIC),pitch features extracted from voiced portion are used to verify region.Experimental results show that Equal Error Rate(EER) of SCD can be reduced to 23.2%,which corresponding to 70% of the F1-value,in the rapid conversion situation of average 1.34 s speech sub-segment.

Key words: Speaker Change Detection(SCD), Computational Auditory Scene Analysis(CASA), Gammatone Energy Cepstral Coefficients(GECC), pitch, Bayesian Information Criterion(BCI)

中图分类号:

TP391

杨登舟,刘加,夏善红. 基于计算听觉场景分析的说话人转换检测[J]. 计算机工程.

YANG Dengzhou,LIU Jia,XIA Shanhong. Speaker Change Detection Based on Computational Auditory Scene Analysis[J]. Computer Engineering.

参考文献

参考文献［1］BAZYAR M,SUDIRMAN R.A New Speaker Change Detection Method in a Speaker Identification System for Two-speakers Segmentation［C］//Proceedings of 2014 ACM Symposium on Computer Applications and Industrial Electronics.New York,USA:ACM Press,2014:141-145. ［2］MALEQAONKAR A S,ARIYAEEINIA A M.Efficient Speaker Change Detection Using Adapted Gaussian Mixture Models［J］.IEEE Transactions on Audio,Speech,and Language Processing,2007,15(6):1859-1869. ［3］ZAHID S,HUSSAIN F,RASHID M,et al.Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods［J］.Mathematical Problems in Engineering,2015(11):209-214. ［4］郑继明,张萍.改进的BIC说话人分割算法［J］.计算机工程,2010,36(17):240-242. ［5］KOTTI M,BENETOS E,KOTROPOULOS C.Computa-tionally Efficient and Robust BIC-based Speaker Segmenta-tion［J］.IEEE Transactions on Audio,Speech,and Language Processing,2008,16(5):920-933. ［6］YANG J,HE Q,LI Y,et al.Speaker Change Detection Based on Mean Shift［J］.Journal of Computers,2013,8(3):638-644. ［7］WU Z,EVANS N,KINNUNEN T,et al.Spoofing and Countermeasures for Speaker Verification:A Survey［J］.Speech Communication,2015,66(1):130-153. ［8］张学良,刘文举,李鹏,等.改进谐波组织规则的单通道浊语音分离系统［J］.声学学报,2011,36(1):88-96. ［9］CUSACK R,DECKS J,AIKMAN G,et al.Effects of Location,Frequency Region,and Time Course of Selective Attention on Auditory Scene Analysis［J］.Journal of Experimental Psychology:Human Perception and Performance,2004,30(4):643-656. ［10］MAKA T.Change Point Determination in Audio Data Using Auditory Features［J］.International Journal of Electronics and Telecommunications,2015,61(2):185-190. ［11］MEDDIS R.Simulation of Mechanical to Neural Transduction in the Auditory Receptor［J］.The Journal of the Acoustical Society of America,1986,79(3):702-711. ［12］LI L.Performance Analysis of Objective Speech Quality Measures in Mel Domain［J］.Journal of Software Engineering,2015,9(2):350-361. ［13］KAUR G,SINGH D,RANI P.Robust Speaker Recognition Biometric System a Detailed Review［J］.Emerging Research in Management & Technology,2015,4(5):281-288. ［14］王民,任雪妮,孙洁.一种高效的基音检测与评估算法［J］.计算机工程与应用,2014,50(14):126-132. ［15］胡瑛,陈宁.基于小波变换的清浊音分类及基音周期检测算法［J］.电子与信息学报,2008,30(2):353-356. ［16］CHEN S,GOPALAKRISHNAN P.Speaker,Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion［C］//Proceedings of Broadcast News Transcription and Understanding Workshop.San Francisco,USA:Morgan Kaufmann Publishers,1998:127-132. ［17］SEO J S.Speaker Change Detection Based on a Graph-partitioning Criterion［J］.The Journal of the Acoustical Society of Korea,2011,30(2):80-85. ［18］KWON S,NARAYANAN S S.Speaker Change Detection Using a New Weighted Distance Measure［C］//Pro-ceedings of the 7th International Conference on Spoken Language Processing.Washington D.C.,USA:IEEE Press,2002:2537-2540. 编辑索书志

[1]	蔡瑞初, 吴思宇, 乔杰. 面向故障间格兰杰因果发现的霍克斯过程研究[J]. 计算机工程, 2023, 49(1): 65-72.
[2]	邵超,万春红,赵静玉. 流形学习算法中邻域大小参数的递增式选取[J]. 计算机工程, 2014, 40(8): 194-200.
[3]	闫志勇，关欣，李锵. 基于SVM和增强型PCP特征的和弦识别[J]. 计算机工程, 2014, 40(7): 170-173.
[4]	郑继明, 司可宁. 改进的T2-BIC说话人二级分割算法[J]. 计算机工程, 2011, 37(6): 291-292.
[5]	郭鹏, 李乃祥, 刘同海. 基于进化MCMC的DBN学习算法[J]. 计算机工程, 2011, 37(10): 143-145.
[6]	张磊, 刘继芳, 项学智. 基于计算听觉场景分析的混合语音分离[J]. 计算机工程, 2010, 36(14): 24-25.
[7]	王晓斌;温春;石昭祥. 基于贝叶斯信息准则的文本主题数估计[J]. 计算机工程, 2009, 35(7): 183-185.
[8]	谭立球;夏利民;谷士文. 基于信息瓶颈算法的图像分割[J]. 计算机工程, 2008, 34(18): 215-216.
[9]	王珊;许刚. 基于计算听觉场景分析的语音混叠信号分离[J]. 计算机工程, 2007, 33(18): 211-213.

选择文件类型/文献管理软件名称

选择包含的内容

基于计算听觉场景分析的说话人转换检测

Speaker Change Detection Based on Computational Auditory Scene Analysis

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 9

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于计算听觉场景分析的说话人转换检测

Speaker Change Detection Based on Computational Auditory Scene Analysis

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 9

编辑推荐

Metrics

本文评价