基于发声机理与人耳感知特性的说话人识别

doi:10.3969/j.issn.1000-3428.2013.11.044

计算机工程

基于发声机理与人耳感知特性的说话人识别

杜晓青，于凤芹

(江南大学物联网工程学院，江苏无锡 214122)

收稿日期:2012-10-16 出版日期:2013-11-15 发布日期:2013-11-13
作者简介:杜晓青(1988－)，女，硕士研究生，主研方向：语音信号处理；于凤芹，教授、博士
基金资助:
国家自然科学基金资助项目(61075008)

Speaker Recognition Based on Vocal Mechanism and Human Ear Perceptual Characteristic

DU Xiao-qing, YU Feng-qin

(School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China)

Received:2012-10-16 Online:2013-11-15 Published:2013-11-13

摘要/Abstract

摘要：

Mel频率倒谱系数(MFCC)与线性预测倒谱系数(LPCC)融合算法只能反映语音静态特征，且LPCC对语音低频局部特征描述不足。为此，提出将希尔伯特黄变换(HHT)倒谱系数与相对光谱-感知线性预测倒谱系数(RASTA-PLPCC)融合，得到一种既反映发声机理又体现人耳感知特性的说话人识别算法。HHT倒谱系数体现发声机理，能反映语音动态特性，并更好地描述信号低频局部特征，可改进LPCC的不足。PLPCC体现人耳感知特性，识别性能强于MFCC，用3种融合算法对两者进行融合，将融合特征用于高斯混合模型进行说话人识别。仿真实验结果表明，该融合算法较已有的MFCC与LPCC融合算法识别率提高了8.0%。

关键词: 说话人识别, 发声机理, 人耳感知特性, 希尔伯特黄变换倒谱系数, 感知线性预测倒谱系数, Relative Spectra滤波

Abstract:

The fusion algorithm of Mel Frequency Cepstral Coefficient(MFCC) and Linear Prediction Cepstrum Coeficient(LPCC) can only react the static characteristics of the speech and LPCC can not describe the local characteristics of the speech low frequency well. So the fusion of Hilbert-Huang Transform(HHT) cepstrum coefficient and Relative Spectra-Perception Linear Prediction Cepstrum Coefficient(RASTA-PLPCC) is proposed, getting a new speaker recognition algorithm that reflects both vocal mechanism and human ear perceptual characteristics. The HHT cepstrum coefficient reflects the human vocal mechanism, and it can reflect the dynamic characteristics of the speech, as well as better describe the local characteristics of the speech low frequency. PLPCC reflects the human ear perceptual characteristics, whose identification performance is better than the MFCC. Two features are combined with the three fusion algorithms, and the fusion feature is sent into the Gaussian mixture model to do speaker recognition. Simulation results demonstrate that compared with the fusion of LPCC and MFCC, the fusion algorithm gets higher recognition rate, and recognition rate is increased by 8.0%.

Key words: speaker recognition, vocal mechanism, human ear perceptual characteristic, Hilbert-Huang Transform(HHT) cepstrum coefficient, perception linear prediction cepstrum coefficient, Relative Spectra filtering

中图分类号:

TN912.3

杜晓青，于凤芹. 基于发声机理与人耳感知特性的说话人识别[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2013.11.044.

DU Xiao-qing, YU Feng-qin. Speaker Recognition Based on Vocal Mechanism and Human Ear Perceptual Characteristic[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2013.11.044.

https://www.ecice06.com/CN/Y2013/V39/I11/197

参考文献

(上接第199页) 参考文献 [1] 赵力. 语音信号处理[M]. 北京: 机械工业出版社, 2003. [2] Yuan Yujin, Zhao Peihua, Zhou Qun. Research of Speaker Recognition Based on Combination of LPCC and MFCC[C]// Proc. of IEEE International Conference on Intelligent Compu- ting and Intelligent Systems. [S. l.]: IEEE Press, 2010: 765-767. [3] Geng Yanxiang, Wang Guangyan, Zhu Cheng, et al. Speaker Recognition System Based on VQ in MATLAB Environ- ment[C]//Proc. of International Conference on Artificial Intel- ligence and Computational Intelligence. [S. l.]: Springer, 2012: 494-501. [4] 陈杰, 张玲华. 说话人识别中语音特征参数的研究[J]. 信息技术, 2006, 30(11): 88-93. [5] Huang N E, Zheng Shen. The Empirical Mode Decom- position and the Hilbert Spectrum for Nonlinear and Nonstationary Time Series and Analysis[J]. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 1998, 454(1971): 903-995. [6] Hermansky H. Perceptual Linear Predictive(PLP) Analysis of Speech[J]. Acoustical Society of America, 1990, 87(4): 1738-1752. [7] Rajnoha J, Pollak P. ASR Systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness[J]. Radio Engineering, 2011, 20(1): 74-87. [8] Summerfield Q, Assmann P. Auditory Enhancement and the Perception of Concurrent Vowels[J]. Perception & Psychophysics, 1989, 45(6): 529-536. [9] Hermansky H. RASTA Processing of Speech[J]. IEEE Trans. on Speech and Audio Processing, 1994, 2(4): 578-589. [10] Hermansky H, Morgan N, Bayya A, et al. RASTA-PLP Speech Analysis Technique[C]//Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing. [S. l.]: IEEE Press, 1992: 121-124. [11] 宋倩倩. 基于Hilbert-Huang变换的语音信号时频分析[D]. 无锡: 江南大学, 2009. [12] Woo R H, Park A, Hazen T J. The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments[C]//Proc. of Speaker and Language Recognition Workshop. [S. l.]: IEEE Press, 2006. 编辑顾逸斐

[1]	曹书鑫, 冯藤藤, 葛凤培, 梁春燕. 基于尺度相关‐双向长短期记忆网络模型的说话人识别[J]. 计算机工程, 2023, 49(4): 289-296.
[2]	项要杰，杨俊安，李晋徽，陆俊. 一种适用于说话人识别的改进Mel滤波器[J]. 计算机工程, 2013, 39(11): 214-217,222.
[3]	徐晨, 曹辉, 赵晓. 基于SVM的说话人识别参数选择方法[J]. 计算机工程, 2012, 38(21): 175-177.
[4]	胡峰松, 曹孝玉. 基于Gammatone滤波器组的听觉特征提取[J]. 计算机工程, 2012, 38(21): 168-170,174.
[5]	武宁, 肖星星, 冯瑞. 家用机器人的说话人识别系统[J]. 计算机工程, 2012, 38(2): 207-209.
[6]	张学锋, 王芳, 夏萍. 融合LPC与MFCC的特征参数[J]. 计算机工程, 2011, 37(4): 216-217.
[7]	周萍, 唐李珍. 基于信息融合的短语音说话人识别方法研究[J]. 计算机工程, 2011, 37(2): 169-171.
[8]	陈黎, 徐东平. 基于SVM-GMM的开集说话人识别方法[J]. 计算机工程, 2011, 37(14): 172-174.
[9]	郑泽萍, 王万良, 郑建炜. 基于保局部核RVM的说话人识别方法[J]. 计算机工程, 2011, 37(14): 208-210.
[10]	尹许梅, 何选森. 基于Bark子波变换的MFCC特征提取[J]. 计算机工程, 2011, 37(11): 192-194.
[11]	李睿;李伟娟;李明. 基于加权量子粒子群的分类器设计[J]. 计算机工程, 2010, 36(7): 203-204,.
[12]	王佳毅, 张丽清. 基于稀疏约束判别分析的说话人识别算法[J]. 计算机工程, 2010, 36(10): 206-208.
[13]	唐　晖;李弼程;屈　丹;张连海. VoIP压缩码流说话人识别研究[J]. 计算机工程, 2009, 35(7): 180-182.
[14]	孔维功;张国杰;张效军. MFCC中DCT结构的设计与实现[J]. 计算机工程, 2009, 35(5): 265-267.
[15]	张燕;唐振民;李燕萍. 基于单字音特征提取的说话人识别方法[J]. 计算机工程, 2009, 35(10): 188-189.

选择文件类型/文献管理软件名称

选择包含的内容

基于发声机理与人耳感知特性的说话人识别

Speaker Recognition Based on Vocal Mechanism and Human Ear Perceptual Characteristic

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于发声机理与人耳感知特性的说话人识别

Speaker Recognition Based on Vocal Mechanism and Human Ear Perceptual Characteristic

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价