作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2019, Vol. 45 ›› Issue (8): 248-254. doi: 10.19678/j.issn.1000-3428.0053232

• 多媒体技术及应用 • 上一篇    下一篇

基于多核学习特征融合的语音情感识别方法

王忠民a,b, 刘戈a, 宋辉a,b   

  1. 西安邮电大学 a. 计算机学院;b. 陕西省网络数据分析与智能处理重点实验室, 西安 710121
  • 收稿日期:2018-11-23 修回日期:2019-01-11 出版日期:2019-08-15 发布日期:2019-08-08
  • 作者简介:王忠民(1967-),男,教授、博士,主研方向为语音情感识别、嵌入式系统、智能信息处理;刘戈,硕士研究生;宋辉,讲师、硕士。
  • 基金资助:
    国家自然科学基金(61373116);陕西省科技统筹创新工程计划项目(2016KTZDGY04-01);陕西省教育厅专项科研计划项目(16JK1706);西安市科技局科技计划项目(2017084CG/RC047(XAYD001));西安邮电大学研究生创新创业基金(CXJJ2017061)。

Speech Emotion Recognition Method Based on Multiple Kernel Learning Feature Fusion

WANG Zhongmina,b, LIU Gea, SONG Huia,b   

  1. a. School of Computer Science and Technology;b. Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, China
  • Received:2018-11-23 Revised:2019-01-11 Online:2019-08-15 Published:2019-08-08

摘要: 在语音情感识别中提取梅尔频率倒谱系数(MFCC)会丢失谱特征信息,导致情感识别准确率较低。为此,提出一种结合MFCC和语谱图特征的语音情感识别方法。从音频信号中提取MFCC特征,将信号转换为语谱图,利用卷积神经网络提取图像特征。在此基础上,使用多核学习算法融合音频特征,并将生成的核函数应用于支持向量机进行情感分类。在2种语音情感数据集上的实验结果表明,与单一特征的分类器相比,该方法的语音情感识别准确率高达96%。

关键词: 语音情感识别, 多核学习, 卷积神经网络, 梅尔频率倒谱系数, 语谱图

Abstract: Extracting the Mel-Frequency Cepstral Coefficients(MFCC) in speech emotion recognition will lose the spectral feature information,resulting in a low accuracy of emotion recognition.Therefore,a speech emotion recognition method combining MFCC and spectrogram features is proposed.The MFCC features are extracted from the audio signal,the signal is converted into a spectral map,and the image features are extracted using a Convolutional Neural Network(CNN).On this basis,the Multiple Kernel Learning(MKL) algorithm is used to fuse the audio features,and the generated kernel functions are used to support the vector machine for emotion classification.Experimental results in two kinds of speech emotion data sets show that the speech emotion recognition accuracy of this method is as high as 96% compared with the classifier based on single feature.

Key words: speech emotion recognition, Multiple Kernel Learning(MKL), Convolution Neural Network(CNN), Mel-Frequency Cepstral Coefficients(MFCC), spectrogram

中图分类号: