一种适用于说话人识别的改进Mel滤波器

doi:10.3969/j.issn.1000-3428.2013.11.048

计算机工程

一种适用于说话人识别的改进Mel滤波器

项要杰^1,2，杨俊安^1,2，李晋徽^1,2，陆俊^1,2

(1. 电子工程学院信息系，合肥 230037；2. 安徽省电子制约技术重点实验室，合肥 230037)

收稿日期:2012-09-05 出版日期:2013-11-15 发布日期:2013-11-13
作者简介:项要杰(1987－)，男，硕士研究生，主研方向：语音识别；杨俊安，教授、博士生导师；李晋徽、陆俊，硕士研究生
基金资助:
国家自然科学基金资助项目(60872113)

An Improved Mel-frequency Filter for Speaker Recognition

XIANG Yao-jie ^1,2, YANG Jun-an ^1,2, LI Jin-hui ^1,2, LU Jun ^1,2

(1. Department of Information, Electronic Engineering Institute, Hefei 230037, China; 2. Anhui Province Key Laboratory of Electronic Restriction Technology, Hefei 230037, China)

Received:2012-09-05 Online:2013-11-15 Published:2013-11-13

摘要/Abstract

摘要： Mel倒谱系数(MFCC)侧重提取语音信号的低频信息，对语音信号的频谱分布特性描述不充分，不能有效区分说话人个性信息。为此，通过分析语音信号各频段所含说话人个性信息的不同，结合Mel滤波器和反Mel滤波器在高低频段的不同特性，提出一种适于说话人识别的改进Mel滤波器。实验结果表明，改进Mel滤波器提取的新特征能够获得比传统Mel倒谱系数以及反Mel倒谱系数(IMFCC)更好的识别效果，并且基本不增加说话人识别系统训练和识别的时间开销。

关键词: 说话人识别, Mel倒谱系数, 个性信息, 反Mel倒谱系数, 频谱分布, 语音信号

Abstract: Mel-frequency Cepstral Coefficient(MFCC) focuses on extracting information in the lower frequency of speech signal, and fails to describe the distribution of a speech spectrum sufficiently, so it cannot effectively distinguish speaker’s specific information. By analyzing the distribution of speaker specific information in different frequency bands of the speech signal, different characters of mel-filterbank and inverted mel-filterbank are combined in high and low frequency bands, and an improved filterbank is presented, which is more suitable for speaker recognition. Experimental results show that features are extracted using the improved filterbank achieve better recognition rates compared with the traditional MFCC and Inverted MFCC, and without increasing the computing time obviously.

Key words: speaker recognition, Mel-frequency Cepstral Coefficient(MFCC), specific information, Inverted Mel-frequency Cepstral Coefficient(MFCC), spectrum distribution, speech signal

中图分类号:

TN912.34

项要杰，杨俊安，李晋徽，陆俊. 一种适用于说话人识别的改进Mel滤波器[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2013.11.048.

XIANG Yao-jie, YANG Jun-an, LI Jin-hui, LU Jun. An Improved Mel-frequency Filter for Speaker Recognition[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2013.11.048.

http://www.ecice06.com/CN/Y2013/V39/I11/214

参考文献

参考文献 [1] 俞一彪, 袁冬梅, 薛峰. 一种适于说话人识别的非线性频率尺度变换[J]. 声学学报, 2008, 33(5): 450-455. [2] Hayakawa S, Itakura F. Text-dependent Speaker Recognition Using the Information in the Higher Frequency Band[C]//Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing. Adelaide, Australia: [s. n.], 1994. (下转第222页) (上接第217页) [3] Besacier L, Bonastre J F. Subband Architecture for Automatic Speaker Recognition[J]. Signal Processing, 2000, 80(7): 1245-1259. [4] Xu Ganglu, Jian Wudang. An Investigation of Dependencies Between Frequency Components and Speaker Characteristics for Text-independent Speaker Identification[J]. Speech Communication, 2008, 50(4): 312-322. [5] Chakroborty S, Roy A, Saha G. Improved Closed Set Text- independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks[J]. International Journal of Signal Processing, 2007, 4(2): 114-121. [6] Lei H, Gonzalo E L. Mel, Linear, and Antimel Frequency Cepstral Coefficient in Broad Phonetic Regions for Telephone Speaker Recognition[EB/OL]. (2009-05-03). http://www.icsi. berkeley.edu/pubs/speech/lei_lopez_melantimel.pdf. [7] Yang Hongwu, Liu Yali. Speaker Recognition Based on Weighted Mel-cepstrum[C]//Proc. of the 4th International Conference on Computer Sciences and Convergence Information Technology. [S. l.]: IEEE Computer Society, 2009. [8] Sen N, Basu T K. Significant Improvement in the Closed Set Text-independent Speaker Identification Using Features Extracted from Nyquist Filter Bank[C]//Proc. of the 5th International Conference on Industrial and Information Systems. Mangalore, India: [s. n.], 2010. [9] Sen N, Basu T K. Comparison of Features Extracted Using Time-frequency and Frequency-time Analysis Approach for Text Independent Speaker Identification[C]//Proc. of National Conference on Communications. Mangalore, India: [s. n.], 2011. [10] Cummins F, Grimaldi M, Leonard T, et al. The Chains Corpus: Characterizing Individual Speakers[C]//Proc. of SPECOM’06. st. Petersburg, Russia: [s. n.], 2006. [11] Reynodls D, Rose R. Robust Text-independent Speaker Identification Using Gaussian Mixture Speakermodels[J]. IEEE Trans. on Speech and Audio Processing, 1995, 3(1): 72- 83. 编辑顾逸斐

[1]	曹书鑫, 冯藤藤, 葛凤培, 梁春燕. 基于尺度相关‐双向长短期记忆网络模型的说话人识别[J]. 计算机工程, 2023, 49(4): 289-296.
[2]	刘晶晶, 黄浩. 引入非局部模块卷积神经网络的基频提取模型[J]. 计算机工程, 2023, 49(3): 128-133,160.
[3]	韩云霄, 邵清, 符玉襄, 郭庆. 复杂噪声中基于MFCC距离的语音端点检测算法[J]. 计算机工程, 2020, 46(3): 309-314.
[4]	胡志隆,文畅,谢凯,贺建飚. 联合HMM-UBM与RVM的声纹密码识别算法[J]. 计算机工程, 2018, 44(11): 129-134.
[5]	杜晓青，于凤芹. 基于发声机理与人耳感知特性的说话人识别[J]. 计算机工程, 2013, 39(11): 197-199,204.
[6]	徐晨, 曹辉, 赵晓. 基于SVM的说话人识别参数选择方法[J]. 计算机工程, 2012, 38(21): 175-177.
[7]	胡峰松, 曹孝玉. 基于Gammatone滤波器组的听觉特征提取[J]. 计算机工程, 2012, 38(21): 168-170,174.
[8]	周燕, 刘韬, 尚丽. 基于免疫匹配追踪的语音稀疏分解算法[J]. 计算机工程, 2012, 38(21): 161-163,167.
[9]	武宁, 肖星星, 冯瑞. 家用机器人的说话人识别系统[J]. 计算机工程, 2012, 38(2): 207-209.
[10]	张学锋, 王芳, 夏萍. 融合LPC与MFCC的特征参数[J]. 计算机工程, 2011, 37(4): 216-217.
[11]	周萍, 唐李珍. 基于信息融合的短语音说话人识别方法研究[J]. 计算机工程, 2011, 37(2): 169-171.
[12]	陈黎, 徐东平. 基于SVM-GMM的开集说话人识别方法[J]. 计算机工程, 2011, 37(14): 172-174.
[13]	郑泽萍, 王万良, 郑建炜. 基于保局部核RVM的说话人识别方法[J]. 计算机工程, 2011, 37(14): 208-210.
[14]	尹许梅, 何选森. 基于Bark子波变换的MFCC特征提取[J]. 计算机工程, 2011, 37(11): 192-194.
[15]	李睿;李伟娟;李明. 基于加权量子粒子群的分类器设计[J]. 计算机工程, 2010, 36(7): 203-204,.

选择文件类型/文献管理软件名称

选择包含的内容

一种适用于说话人识别的改进Mel滤波器

An Improved Mel-frequency Filter for Speaker Recognition

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

一种适用于说话人识别的改进Mel滤波器

An Improved Mel-frequency Filter for Speaker Recognition

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价