计算机工程

• 开发研究与工程应用 • 上一篇    下一篇

基于聚合经验模态分解的情感语音特征提取

张乐,张雪英,孙颖,张卫   

  1. (太原理工大学 信息工程学院,太原 030024)
  • 收稿日期:2016-05-18 出版日期:2017-08-15 发布日期:2017-08-15
  • 作者简介:张乐(1991—),女,硕士研究生,主研方向为语音情感识别;张雪英,教授、博士生导师;孙颖,讲师;张卫,博士研究生。
  • 基金项目:
    国家自然科学基金(61371193);山西省回国留学人员科研基金(2013-034)。

Feature Extraction of Emotional Speech Based on Ensemble Empirical Mode Decomposition

ZHANG Le,ZHANG Xueying,SUN Ying,ZHANG Wei   

  1. (College of Information Engineering,Taiyuan University of Technology,Taiyuan 030024,China)
  • Received:2016-05-18 Online:2017-08-15 Published:2017-08-15

摘要: 特征提取是情感语音识别系统的关键过程,决定系统整体识别性能。传统特征提取技术假定语音信号是线性、短时平稳信号,不具有自适应性。为此,通过聚合经验模态分解(EEMD)算法以非线性的处理方式提取特征。情感语音信号经EEMD分解后得到一组固有模态函数(IMF),利用相关系数法筛选出有效分量集合,对集合函数计算得到IMF能量特征(IMFE)。选用德国柏林语音库作为实验数据来源,将IMFE特征、韵律特征、梅尔倒谱系数特征以及三者的融合特征分别输入到支持向量机中,通过比较不同特征的识别结果验证IMFE特征的有效性。实验结果表明,IMFE特征与声学特征融合后的平均识别率达到91.67%,可有效区分不同的情感状态。

关键词: 特征提取, 聚合经验模态分解, 固有模态函数, Spearman Rank相关系数, 声学特征, 情感语音识别

Abstract: Extracting features of emotional speech signal is particularly important in the emotional speech recognition systems,which determines the overall recognition performance.The traditional feature extraction techniques assume speech signal is linear and short-stationary,without self-adapability.By using the Ensemble Empirical Mode Decomposition(EEMD) algorithm,the features are extracted in a nonlinear way.First,the emotional speech signal is decomposed into a series of Intrinsic Mode Function(IMF) by EEMD and effective IMFs set is selected using correlation coefficient method.Then the IMF Energy(IMFE) characteristics are obtained through calculation of the function in the set.In the experiment,Berlin speech database is chosen as the data source.IMFE features,prosodic features,Mel-Fregurecy Cepstrum Coefficients(MFCC) features and the fusion features of the three are input inte SVM respectively.The recognition results of different feature combinations are compared to validate the performance of the IMFE features.The experimental results show that the average recognition rate of IMFE feature merging with acoustic feature can reach 91.67%,and IMFE can effectively distingwish between different states.

Key words: feature extraction, Ensemble Empirical Mode Decomposition(EEMD), Intrinsic Mode Function(IMF), Spearman Rank correlation coefficient, acoustic feature, emotional speech recognition

中图分类号: