基于聚合经验模态分解的情感语音特征提取

doi:10.3969/j.issn.1000-3428.2017.08.052

计算机工程

基于聚合经验模态分解的情感语音特征提取

张乐,张雪英,孙颖,张卫

(太原理工大学信息工程学院,太原 030024)

收稿日期:2016-05-18 出版日期:2017-08-15 发布日期:2017-08-15
作者简介:张乐(1991—),女,硕士研究生,主研方向为语音情感识别;张雪英,教授、博士生导师;孙颖,讲师;张卫,博士研究生。
基金资助:
国家自然科学基金(61371193);山西省回国留学人员科研基金(2013-034)。

Feature Extraction of Emotional Speech Based on Ensemble Empirical Mode Decomposition

ZHANG Le,ZHANG Xueying,SUN Ying,ZHANG Wei

(College of Information Engineering,Taiyuan University of Technology,Taiyuan 030024,China)

Received:2016-05-18 Online:2017-08-15 Published:2017-08-15

摘要/Abstract

摘要： 特征提取是情感语音识别系统的关键过程,决定系统整体识别性能。传统特征提取技术假定语音信号是线性、短时平稳信号,不具有自适应性。为此,通过聚合经验模态分解(EEMD)算法以非线性的处理方式提取特征。情感语音信号经EEMD分解后得到一组固有模态函数(IMF),利用相关系数法筛选出有效分量集合,对集合函数计算得到IMF能量特征(IMFE)。选用德国柏林语音库作为实验数据来源,将IMFE特征、韵律特征、梅尔倒谱系数特征以及三者的融合特征分别输入到支持向量机中,通过比较不同特征的识别结果验证IMFE特征的有效性。实验结果表明,IMFE特征与声学特征融合后的平均识别率达到91.67%,可有效区分不同的情感状态。

关键词: 特征提取, 聚合经验模态分解, 固有模态函数, Spearman Rank相关系数, 声学特征, 情感语音识别

Abstract: Extracting features of emotional speech signal is particularly important in the emotional speech recognition systems,which determines the overall recognition performance.The traditional feature extraction techniques assume speech signal is linear and short-stationary,without self-adapability.By using the Ensemble Empirical Mode Decomposition(EEMD) algorithm,the features are extracted in a nonlinear way.First,the emotional speech signal is decomposed into a series of Intrinsic Mode Function(IMF) by EEMD and effective IMFs set is selected using correlation coefficient method.Then the IMF Energy(IMFE) characteristics are obtained through calculation of the function in the set.In the experiment,Berlin speech database is chosen as the data source.IMFE features,prosodic features,Mel-Fregurecy Cepstrum Coefficients(MFCC) features and the fusion features of the three are input inte SVM respectively.The recognition results of different feature combinations are compared to validate the performance of the IMFE features.The experimental results show that the average recognition rate of IMFE feature merging with acoustic feature can reach 91.67%,and IMFE can effectively distingwish between different states.

Key words: feature extraction, Ensemble Empirical Mode Decomposition(EEMD), Intrinsic Mode Function(IMF), Spearman Rank correlation coefficient, acoustic feature, emotional speech recognition

中图分类号:

TN912.3

张乐,张雪英,孙颖,张卫. 基于聚合经验模态分解的情感语音特征提取[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2017.08.052.

ZHANG Le,ZHANG Xueying,SUN Ying,ZHANG Wei. Feature Extraction of Emotional Speech Based on Ensemble Empirical Mode Decomposition[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2017.08.052.

http://www.ecice06.com/CN/Y2017/V43/I8/306

参考文献

参考文献［1］Luo Q.Speech Emotion Recognition in E-learning System by Using General Regression Neural Network［C］//Proceedings of 2013 International Conference on Future Energy,Environment,and Materials.Hong Kong,China:WIT Press,2014. ［2］Lopez-Otero P,Docio-Fernandez L,Garcia-Mateo C.Assessing Speaker Independence on a Speech-based Depression Level Estimation System［J］.Pattern Recognition Letters,2015,68(10):343-350. ［3］张汝波,刘冠群,吴俊伟,等.移动机器人语音控制技术研究与实现［J］.华中科技大学学报(自然科学版),2013,41(S1):348-351. ［4］赵力,黄程韦.实用语音情感识别中的若干关键技术［J］.数据采集与处理,2014,29(2):157-170. ［5］Lugger M,Yang B.Cascaded Emotion Classification via Psychological Emotion Dimensions Using a Large Set of Voice Quality Parameters［C］//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2008:4945-4948. (下转第315页) (上接第309页) ［6］Pao T L,Chen Y T,Yeh J H,et al.Mandarin Emotional Speech Recognition Based on SVM and NN［C］//Proceedings of the 18th International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2006:1096-1100. ［7］Banbrook M,McLaughlin S,Mann I.Speech Character-ization and Synthesis by Nonlinear Methods［J］.IEEE Transactions on Speech and Audio Processing,1999,7(1):1-17. ［8］Huang N E,Shen Z,Long S R,et al.The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Nonstationary Time Series Analysis［J］.Mathematical,Physical and Engineering Sciences,1998,454(1971):903-925. ［9］Wu Z,Huang N E.Ensemble Empirical Mode Decom-position:A Noise-assisted Data Analysis Method［J］.Advances in Adaptive Data Analysis,2011,1(1):1-41. ［10］秦娜,金炜东,黄进,等.基于EEMD的高速列车转向架故障诊断［J］.计算机工程,2013,39(12):1-4. ［11］Wang T,Zhang M C,Yu Q H,et al.Comparing the Applications of EMD and EEMD on Time-frequency Analysis of Seismic Signal［J］.Journal of Applied Geophysics,2012,83(8):29-34. ［12］张佳芳.基于EEMD的车内语音增强研究［D］.杭州:浙江大学,2007. ［13］曾现巍,许凌云,江晓波,等.基于快速EEMD单通道混合信号分离算法的研究［J］.电子设计工程,2015,23(14):20-22,25. ［14］张鑫瑜,李雪耀,张汝波,等.基于语音抑制的飞机识别研究［J］.华中科技大学学报(自然科学版),2011,39(S2):291-294. ［15］秦娜,金炜东,黄进,等.高速列车转向架故障信号的聚合经验模态分解和模糊熵特征分析［J］.控制理论与应用,2014,31(9):1245-1251. ［16］Burkhardt F,Paeschke A,Rolfes M,et al.A Database of German Emotional Speech［C］//Proceedings of the 9th European Conference on Speech Communication and Technology.Lisbon,Portugal:ISCA,2005:1517-1520. 编辑陆燕菲

[1]	马娜, 温廷新, 贾旭, 李晓会. 复杂光照条件下自适应的车脸重识别模型[J]. 计算机工程, 2023, 49(8): 275-282, 290.
[2]	戴浩磊, 黄永慧, 周郭许. 基于超图正则化非负张量链分解的聚类分析[J]. 计算机工程, 2023, 49(6): 81-89.
[3]	宋羽凯, 谢江. 基于多任务学习的轻量级语音情感识别模型[J]. 计算机工程, 2023, 49(5): 122-128.
[4]	李培育, 张雅丽. 基于改进SRGAN模型的人脸图像超分辨率重建[J]. 计算机工程, 2023, 49(4): 199-205.
[5]	关日鹏, 况立群, 焦世超, 熊风光, 韩燮. 多模态特征融合与词嵌入驱动的三维检索方法[J]. 计算机工程, 2023, 49(4): 101-107,113.
[6]	耿磊, 傅洪亮, 陶华伟, 卢远, 郭歆莹, 赵力. 基于动态卷积递归神经网络的语音情感识别[J]. 计算机工程, 2023, 49(4): 125-130,137.
[7]	何悦, 陈广胜, 景维鹏, 徐泽堃. 基于深度多相似性哈希方法的遥感图像检索[J]. 计算机工程, 2023, 49(2): 206-212.
[8]	高庆吉, 李天昊, 邢志伟, 刘佩佩. 基于区块特征融合的点云语义分割方法[J]. 计算机工程, 2022, 48(9): 37-44,54.
[9]	闫静, 张雪英, 李凤莲, 陈桂军, 黄丽霞. 结合栈式监督AE与可变加权ELM的回归预测模型[J]. 计算机工程, 2022, 48(8): 62-69,76.
[10]	李晨, 侯进, 李金彪, 陈子锐. 基于注意力与残差级联的红外与可见光图像融合方法[J]. 计算机工程, 2022, 48(7): 234-240.
[11]	崔云轩, 刘桂华, 余东应, 郭中远, 张文凯. 点线特征融合的激光雷达单目惯导SLAM系统[J]. 计算机工程, 2022, 48(7): 254-263.
[12]	李柯泉, 陈燕, 刘佳晨, 牟向伟. 基于深度学习的目标检测算法综述[J]. 计算机工程, 2022, 48(7): 1-12.
[13]	谢斌红, 秦耀龙, 张英俊. 基于学习主动中心轮廓模型的场景文本检测[J]. 计算机工程, 2022, 48(3): 244-252,262.
[14]	汪荣贵, 李懂, 杨娟, 薛丽霞. 基于跨域特征关联与聚类的无监督行人重识别[J]. 计算机工程, 2022, 48(3): 229-235,243.
[15]	陈乔松, 蒲柳, 张羽, 孙开伟, 邓欣, 王进. 结合整体注意力与分形稠密特征的图像超分辨率重建[J]. 计算机工程, 2022, 48(11): 207-214,223.

选择文件类型/文献管理软件名称

选择包含的内容

基于聚合经验模态分解的情感语音特征提取

Feature Extraction of Emotional Speech Based on Ensemble Empirical Mode Decomposition

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于聚合经验模态分解的情感语音特征提取

Feature Extraction of Emotional Speech Based on Ensemble Empirical Mode Decomposition

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价