基于改进语音处理的卷积神经网络中文语音情感识别方法

doi:10.19678/j.issn.1000-3428.0060270

摘要/Abstract

摘要： 语音情感识别在人机交互中具有重要意义。为解决中文语音情感识别效率和准确率低等问题，提出一种基于Trumpet-6卷积神经网络模型的中文语音情感识别方法。在MFCC特征提取过程中，通过增加分帧加窗操作时采样点的个数，增添每个汉明窗内的特征及减少汉明窗个数，从而缩小MFCC特征图的像素尺寸，提高单次识别的处理效率。在此基础上，使用高斯白噪声对数据集进行数据增强处理，缓解训练过程中的过拟合问题。在CASIA语音情感数据集上的实验结果表明，该方法的测试准确率达95.7%，优于Lenet-5、RNN、LSTM等传统方法，且Trumpet-6卷积神经网络模型采用2 048个采样点，仅需176 550个待训练参数，与采用DCNN的ResNet34和循环神经网络模型相比，参数更少，结构更简单，处理速度更快。

关键词: 语音情感识别, MFCC特征, 高斯白噪声, 数据增强, 卷积神经网络

Abstract: Speech emotion recognition is essential in human-computer interaction.In this study, a Chinese speech emotion recognition method based on the Trumpt-6 convolutional neural network model was developed to solve the problem of low efficiency and accuracy of Chinese speech emotion recognition.During the process of extracting the Mel Frequency Cepstral Coefficient (MFCC) feature, the pixel size of the MFCC feature map was reduced to improve the processing efficiency of single recognition.This was achieved by increasing the number of sampling points in the frame windowing operation, adding the features in each Hamming window, and reducing the number of Hamming windows.Gaussian white noise was used to enhance the data set to minimize overfitting during the training process.The experimental results for the CASIA speech emotion data set show that the test accuracy of this method is 95.7%, which is better than those of traditional methods, such as Lenet-5, Recurrent Neural Network(RNN), and Long Short-Term Memory(LSTM).The Trump-6 convolutional neural network model uses 2 048 sampling points and only 176 550 parameters for training.This method has fewer parameters, a simpler structure, and faster processing than ResNet34 and the cyclic neural network model using deep convolutional neural networks.

Key words: speech emotion recognition, MFCC feature, white Gaussian noise, data set enhancement, Convolution Neural Network(CNN)

中图分类号:

TP391.4

乔栋, 陈章进, 邓良, 屠程力. 基于改进语音处理的卷积神经网络中文语音情感识别方法[J]. 计算机工程, 2022, 48(2): 281-290.

QIAO Dong, CHEN Zhangjin, DENG Liang, TU Chengli. Method for Chinese Speech Emotion Recognition Based on Improved Speech-Processing Convolutional Neural Network[J]. Computer Engineering, 2022, 48(2): 281-290.

http://www.ecice06.com/CN/Y2022/V48/I2/281

图/表 17

20220301181939

20220301181941

20220301181944

20220301181948

20220301181951

20220301181954

20220301181957

20220301182000

20220301182003

20220301182006

20220301182011

20220301182014

20220301182018

20220301182021

20220301182025

20220301182028

20220301182032

参考文献

[1] ANAGNOSTOPOULOS C N, ILIOU T, GIANNOUKOS I.Features and classifiers for emotion recognition from speech:a survey from 2000 to 2011[J].Artificial Intelligence Review, 2015, 43(2):155-177.
[2] YILDIRIM S, KAYA Y, KL F.A modified feature selection method based on metaheuristic algorithms for speech emotion recognition[J].Applied Acoustics, 2021, 173(4):107721-107732.
[3] 余华, 颜丙聪.基于CTC-RNN的语音情感识别方法[J].电子器件, 2020, 43(4):934-937. YU H, YAN B C.Speech emotion recognition method based on CTC-RNN[J].Electronic devices, 2020, 43(4):934-937.(in Chinese)
[4] 汪炳元.基于深度学习的语音情感识别研究[D].哈尔滨:哈尔滨工业大学, 2020. WANG B Y.Research on speech emotion recognition based on deep learning[D].Harbin:Harbin Institute of technology, 2020.(in Chinese)
[5] BRUNI V, TARTAGLIONE M, VITULANO D.An iterative approach for spectrogram reassignment of frequency modulated multicomponent signals[J].Mathematics and Computers in Simulation, 2020, 176:96-119.
[6] AYADI M E, KAMEL M S, KARRAY F.Survey on speech emotion recognition:features, classification schemes, and databases[J].Pattern Recognition, 2011, 44(3):572-587.
[7] VLASSIS N, LIKAS A.A greedy EM algorithm for Gaussian mixture learning[J].Neural Processing Letters, 2002, 15(1):77-87.
[8] HU H, XU M X, WU W.GMM supervector based SVM with spectral features for speech emotion recognition[C]//Proceedings of 2007 IEEE International Conference on Acoustics.Washington D.C., USA:IEEE Press, 2007:413-416.
[9] ADITYA R, FABIO D T, MARK S.Hidden Markov models with random restarts versus boosting for malware detection[J].Journal of Computer Virology and Hacking Techniques, 2018, 15(4):97-107.
[10] MAO Q, DONG M, HUANG Z, et al.Learning salient features for speech emotion recognition using convolutional neural networks[J].IEEE Transactions on Multimedia, 2014, 16(8):2203-2213.
[11] ZHANG B, QUAN C, REN F.Performance of convolution neural network on the recognition of speech emotion and images[EB/OL].[2020-11-04].https://www.semanticscholar.org/paper/Performance-of-Convolution-Neural-Network-on-the-of-Zhang/f649f1a6e9231e96c57e12a5a58072c04d3ff067?p2df.
[12] ZHENG W Q, YU J S, ZOU Y X.An experimental study of speech emotion recognition based on deep convolutional neural networks[C]//Proceedings of International Conference on Affective Computing and Intelligent Interaction.Washington D.C., USA:IEEE Press, 2015:827-831.
[13] 曾润华, 张树群.改进卷积神经网络的语音情感识别方法[J].应用科学学报, 2018, 36(5):837-844. ZENG R H, ZHANG S Q.Speech emotion recognition based on improved convolutional neural network[J].Journal of Applied Sciences, 2018, 36(5):837-844.(in Chinese)
[14] 缪裕青, 邹巍, 刘同来, 等.基于参数迁移和卷积循环神经网络的语音情感识别[J].计算机工程与应用, 2019, 55(10):135-140, 198. MIAO Y Q, ZOU W, LIU T L, et al.Speech emotion recognition based on parameter transfer and convolution recurrent neural network[J].Computer Engineering and Applications, 2019, 55(10):135-140, 198.(in Chinese)
[15] 姜芃旭, 傅洪亮, 陶华伟, 等.一种基于卷积神经网络特征表征的语音情感识别方法[J].电子器件, 2019, 42(4):998-1001. JIANG P X, FU H L, TAO H W, et al.A speech emotion recognition method based on convolutional neural network feature representation[J].Electronic Devices, 2019, 42(4):998-1001.(in Chinese)
[16] 冯天艺, 杨震.采用多任务学习和循环神经网络的语音情感识别算法[J].信号处理, 2019, 46(7):1133-1140. FENG T Y, YANG Z.Speech emotion recognition algorithm based on multi task learning and recurrent neural network[J].Signal Processing, 2019, 46(7):1133-1140.(in Chinese)
[17] 张耿.多分类支持向量机的研究及在说话人识别中的应用[D].长沙:中南大学, 2007. ZHANG G.Research on multi class support vector machine and its application in speaker recognition[D].Changsha:Central South University, 2007.(in Chinese)
[18] 陈旺.语音端点检测的鲁棒性研究[D].广州:广州大学, 2019. CHEN W.Robustness of speech endpoint detection[D].Guangzhou:Guangzhou University, 2019.(in Chinese)
[19] 王林.基于非局部均值的图像去噪方法研究[D].西安:西安电子科技大学, 2014. WANG L.research on image denoising method based on nonlocal mean[D].Xi'an:Xi'an University of Electronic Science and Technology, 2014.(in Chinese)
[20] 任杰, 郭卉, 姜囡.不同情感的语音声学特征分析[J].光电技术应用, 2019, 34(5):31-36, 62. REN J, GUO H, JIANG N.Analysis of acoustic characteristics of different emotions[J].Application of Optoelectronic Technology, 2019, 34(5):31-36, 62.(in Chinese)
[21] 夏鼎, 徐文涛.基于生成对抗网络合成噪声的语音增强方法研究[J].电子技术应用, 2020, 46(11):56-59, 64. XIA D, XU W T.Research on speech enhancement method based on generating counter network synthetic noise[J].Application of Electronic Technology, 2020, 46(11):56-59, 64.(in Chinese)
[22] 孔德廷.一种改进的基于对数谱估计的语音增强算法[J].声学技术, 2020, 39(2):208-213. KONG D T.An improved speech enhancement algorithm based on logarithmic spectral estimation[J].Acoustics, 2020, 39(2):208-213.(in Chinese)
[23] 薛珊, 李广青, 吕琼莹, 等.基于卷积神经网络的反无人机系统声音识别方法[J].工程科学学报, 2020, 42(11):1516-1524. XUE S, LI G Q, LV Q Y, et al.Voice recognition method of anti UAV system based on convolutional neural network[J].Journal of Engineering Science, 2020, 42(11):1516-1524.(in Chinese)
[24] LIN H S, JIA C, KE L X, et al.Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition[J].International Journal of Speech Technology 2018, 21(4):931-940.
[25] 陈俊芬, 赵佳成, 韩洁, 等.基于深度特征表示的Softmax聚类算法[J].南京大学学报(自然科学), 2020, 56(4):533-540. CHEN J F, ZHAO J C, HAN J, et al.Softmax clustering algorithm based on depth feature representation[J].Journal of Nanjing University (Natural Science), 2020, 56(4):533-540.(in Chinese)
[26] 万磊, 佟鑫, 盛明伟, 等.Softmax分类器深度学习图像分类方法应用综述[J].导航与控制, 2019, 18(6):1-9, 47. WAN L, TONG X, SHENG M W, et al.A review of the application of softmax classifier in deep learning image classification methods[J].Navigation and Control, 2019, 18(6):1-9, 47.(in Chinese)
[27] CHEN M, HE X, YANG J, et al.3D convolutional recurrent neural networks with attention model for speech emotion recognition[J].IEEE Signal Processing Letters, 2018, 25(10):1440-1444.
[28] JERMSITTIPARSERT K, ABDURRAHMAN A, SIRIATTAKUL P, et al.Pattern recognition and features selection for speech emotion recognition model using deep learning[J].International Journal of Speech Technology, 2020, 23(4):799-806.
[29] LIM W, JANG D, LEE T.Speech emotion recognition using convolutional and recurrent neural networks[C]//Proceedings of 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.Washington D.C., USA:IEEE Press, 2016:1-4.
[30] FAROOQ M, HUSSAIN F, BALOCH N K, et al.Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network[J].Sensors, 2020, 20(21):6008-6016.
[31] ÖZSEVEN T.A novel feature selection method for speech emotion recognition[J].Applied Acoustics, 2019, 146(6):320-326.

选择文件类型/文献管理软件名称

选择包含的内容