面向网络舆情分析的多任务学习策略时间卷积网络

doi:10.19678/j.issn.1000-3428.0065977

摘要/Abstract

摘要：

检测与识别网络中语音的情感状态有助于把控舆情信息，若能同时辨别说话人及其性别，则对掌握舆情的真实意图更有帮助。基于数据集EMODB，提出用于情感分类、说话人辨别和性别识别的多任务学习策略时间卷积网络（DTCN）。针对多任务学习中数据集较小的问题，设计数据增强技术，在不同信噪比下采用加噪的方式对数据集EMODB进行扩充，构建单信噪比含噪数据集EMODB-10、EMODB-5、EMODB0、EMODB5、EMODB10以及多信噪比含噪数据集EMODBM。同时，通过研究单一噪声和混合噪声，验证不同噪声对DTCN模型性能的影响。为了更好地表征数据特性，提出适用于多任务学习的声学特征集。实验结果表明，在具有正信噪比和多信噪比含噪数据集上进行测试时，DTCN模型在多任务学习场景下的表现均优于基线，较容易辨别说话人性别，且随着噪声种类增多，对多任务学习的性能不断提高，在混合噪声下鲁棒性和泛化性更好。

关键词: 语音情感识别, 策略时间卷积网络, 多任务学习, 数据扩充, 特征提取

Abstract:

In the analysis of online speech opinions, it is important to understand the underlying significance of a speech. To gain a deeper insight into the true intentions of public opinion, the extraction of additional information such as the emotional context of the speech and gender of the speaker can be highly beneficial. First, a novel Diplomatic Temporal Convolutional Network(DTCN) is designed for Multi-Task Learning(MTL), specifically for tasks including emotion classification, speaker recognition, and gender recognition. Second, a data augmentation technique allows diversifying the EMODB dataset by incorporating various Signal-to-Noise Ratio (SNR). This results in the creation of individual noisy datasets labeled as EMODB-10, EMODB-5, EMODB0, EMODB5, EMODB10, as well as a comprehensive multi-SNR dataset known as EMODBM. Simultaneously, a comprehensive study was conducted on both single and hybrid noisy datasets to assess how different noise levels impact the performance of the DTCN model. Additionally, an acoustic feature fusion technique enhances data representation. Experimental results indicate that the DTCN model performs better in MTL scenarios compared to the baseline when tested on noisy datasets created with positive SNRs and multi-SNR datasets. Moreover, it demonstrates a high accuracy in speaker gender recognition. Lastly, given that a greater diversity of noise types is introduced, the performance of the MTL model continues to improve, achieving enhanced robustness and generalization, particularly when confronted with hybrid noise scenarios.

Key words: speech emotion recognition, Diplomatic Temporal Convolutional Network(DTCN), Multi-Task Learning(MTL), data augmentation, feature extraction

张会云, 黄鹤鸣. 面向网络舆情分析的多任务学习策略时间卷积网络[J]. 计算机工程, 2023, 49(10): 89-96, 104.

Huiyun ZHANG, Heming HUANG. Diplomatic Temporal Convolutional Network with Multi-Task Learning for Network Public Opinion Analysis[J]. Computer Engineering, 2023, 49(10): 89-96, 104.

http://www.ecice06.com/CN/Y2023/V49/I10/89

图/表 15

图1 数据集EMODB及其扩充数据集中样本数占比

Fig.1 Percentage of samples on the dataset EMODB and its augmented datasets

图2 DTCN模型的结构

Fig.2 Structure of the DTCN model

图3 因果卷积的可视化

Fig.3 Visualization of causal convolution

图4 空洞卷积的可视化过程

Fig.4 Visualization process of dilated convolution

图5 残差块的结构

Fig.5 Structure of residual block

图6 不同模型在数据集EMODB0上的多任务识别准确率

Fig.6 Multi-task recognition accuracy of different models on dataset EMODB0

图7 DTCN模型在扩充的数据集EMODB上的多任务识别结果

Fig.7 Multi-task recognition results of DTCN model on the augmented dataset EMODB

图8 不同噪声下DTCN模型对多任务的识别结果

Fig.8 Results of DTCN model for multitask recognition under the different noise

图9 混合噪声下DTCN模型对多任务识别的影响

Fig.9 Performance of DTCN model for multitask recognition under the hybrid noise

参考文献 32

1	CARUANA R. Learning many related tasks at the same time with backpropagation[C]//Proceedings of International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 1994: 657-664.
2	宋云峰, 任鸽, 杨勇, 等. 基于注意力的多层次混合融合的多任务多模态情感分析. 计算机应用研究, 2022, 39 (3): 716- 720. doi: 10.19734/j.issn.1001-3695.2021.08.0357
	SONG Y F, REN G, YANG Y, et al. Multimodal sentiment analysis based on hybrid feature fusion of multi-level attention mechanism and multi-task learning. Application Research of Computers, 2022, 39 (3): 716- 720. doi: 10.19734/j.issn.1001-3695.2021.08.0357
3	ZHENG H, WANG R L, JI W T, et al. Discriminative deep multi-task learning for facial expression recognition. Information Sciences, 2020, 533, 60- 71. doi: 10.1016/j.ins.2020.04.041
4	CHOWDHURI S, PANKAJ T, ZIPSER K. MultiNet: multi-modal multi-task learning for autonomous driving[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision. Washington D. C., USA: IEEE Press, 2019: 1496-1504.
5	HE R D, LEE W S, NG H T, et al. An interactive multi-task learning network for end-to-end aspect-based sentiment analysis[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Washington D. C., USA: IEEE Press, 2019: 504-515.
6	AKHTAR M S, CHAUHAN D S, GHOSAL D, et al. Multi-task learning for multi-modal emotion recognition and sentiment analysis [C]//Proceedings of IEEE Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Washington D. C., USA: IEEE Press, 2019: 370-379.
7	ZHANG X K, JIN C N, HU P, et al. NURBS modeling and isogeometric shell analysis for complex tubular engineering structures. Computational and Applied Mathematics, 2017, 36 (4): 1659- 1679. doi: 10.1007/s40314-016-0312-1
8	DEB S, DANDAPAT S. Multi-scale amplitude feature and significance of enhanced vocal tract information for emotion classification. IEEE Transactions on Cybernetics, 2019, 49 (3): 802- 815. doi: 10.1109/TCYB.2017.2787717
9	YI L, MAK M W. Improving speech emotion recognition with adversarial data augmentation network. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33 (1): 172- 184. doi: 10.1109/TNNLS.2020.3027600
10	HE T, MAO H A, YI Z. Subtraction gates: another way to learn long-term dependencies in recurrent neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33 (4): 1740- 1751. doi: 10.1109/TNNLS.2020.3043752
11	LEA C, FLYNN M D, VIDAL R, et al. Temporal convolutional networks for action segmentation and detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 1003-1012.
12	AL-ABRI S, LIN T X, TAO M L, et al. A derivative-free optimization method with application to functions with exploding and vanishing gradients. IEEE Control Systems Letters, 2021, 5 (2): 587- 592. doi: 10.1109/LCSYS.2020.3004747
13	WU B Y, XIE Q, WU B H. Seismic impedance inversion based on residual attention network. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60, 1- 17.
14	YANG Q L, SADEGHI A, WANG G, et al. Learning two-layer ReLU networks is nearly as easy as learning linear classifiers on separable data. IEEE Transactions on Signal Processing, 2021, 69, 4416- 4427. doi: 10.1109/TSP.2021.3094911
15	NAEEM M, MAGED S A. Linear time invariant state space system identification using Adam optimization[C]//Proceedings of International Conference on Innovative Trends in Communication and Computer Engineering. Washington D. C., USA: IEEE Press, 2020: 196-204.
16	DAIYA D, WU M S, LIN C. Stock movement prediction that integrates heterogeneous data sources using dilated causal convolution networks with attention[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2020: 8359-8363.
17	HARINARAYANAN E, GHANEK AR S. An efficient method for generic dsp implementation of dilated convolution [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2022: 51-55
18	LI Y X, LI X Q, DONG Y J, et al. Densely connected network with time-frequency dilated convolution for speech enhancement[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2019: 6860-6864.
19	SONG Z Y, ZHAO X Q, HUI Y Y, et al. Fusing attention network based on dilated convolution for superresolution. IEEE Transactions on Cognitive and Developmental Systems, 2023, 15 (1): 234- 241. doi: 10.1109/TCDS.2022.3153090
20	DUDUKCU H V, TASKIRAN M, KAHRAMAN N. Instantaneous power consumption prediction with modified temporal convolutional network for UAVs[C]//Proceedings of the 45th International Conference on Telecommunications and Signal Processing. Washington D. C., USA: IEEE Press, 2022: 106-109.
21	JIN X, XIE Y P, WEI X S, et al. A lightweight encoder-decoder path for deep residual networks. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33 (2): 866- 878. doi: 10.1109/TNNLS.2020.3029613
22	WANG Y W, WARD R, WANG Z J. Coarse-to-fine image DeHashing using deep pyramidal residual learning. IEEE Signal Processing Letters, 2019, 26 (9): 1295- 1299. doi: 10.1109/LSP.2019.2917073
23	DAS A, WASIF ANSARI M, BASAK R. Covid-19 face mask detection using TensorFlow, keras and OpenCV[C]//Proceedings of the 17th IEEE India Council International Conference. Washington D. C., USA: IEEE Press, 2021: 1-5.
24	ŞEN S Y, ÖZKURT N. Convolutional neural network hyperparameter tuning with Adam optimizer for ECG classification[C]//Proceedings of Innovations in Intelligent Systems and Applications Conference. Washington D. C., USA: IEEE Press, 2020: 1-6.
25	张会云, 黄鹤鸣. 基于多操作网络的图式多域语音情感识别研究. 计算机工程, 2022, 48 (7): 59- 65. URL
	ZHANG H Y, HUANG H M. Research on schema multi-domain speech emotion recognition based on multi-operation network. Computer Engineering, 2022, 48 (7): 59- 65. URL
26	张会云, 黄鹤鸣. 基于异构并行神经网络的语音情感识别. 计算机工程, 2022, 48 (4): 113- 118. URL
	ZHANG H Y, HUANG H M. Speech emotion recognition based on heterogeneous parallel neural network. Computer Engineering, 2022, 48 (4): 113- 118. URL
27	ZHANG H Y, HUANG H M. An improved capsule network for speech emotion recognition. Berlin, Germany: Springer, 2022: 139- 157.
28	XIN R Y, ZHANG J, SHAO Y T. Complex network classification with convolutional neural network. Tsinghua Science and Technology, 2020, 25 (4): 447- 457. doi: 10.26599/TST.2019.9010055
29	YIN X Y, GONG S, CAO W W, et al. Fault prediction model of cloud platform based on long short-term memory network[C]//Proceedings of the 10th IEEE Joint International Information Technology and Artificial Intelligence Conference. Washington D. C., USA: IEEE Press, 2022: 411-414.
30	ZHENG E D, LIU L C. Design of online handwritten mathematical expression recognition system based on gated recurrent unit recurrent neural network[C]//Proceedings of the 4th International Conference on Pattern Recognition and Artificial Intelligence. Washington D. C., USA: IEEE Press, 2021: 446-451.
31	ANANDA D, TAQIYYUDDIN T A, NUGRAHA FAQIH I, et al. Application of bidirectional gated recurrent unit in sentiment analysis of tokopedia application users[C]//Proceedings of International Conference on Artificial Intelligence and Big Data Analytics. Washington D. C., USA: IEEE Press, 2022: 1-4.
32	ALAMSYAH R D, SUYANTO S. Speech gender classification using bidirectional long short term memory[C]//Proceedings of the 3rd International Seminar on Research of Information Technology and Intelligent Systems. Washington D. C., USA: IEEE Press, 2021: 646-649.

[1]	刘晓黎, 王轶彤. 基于自监督学习的多密度图会话推荐[J]. 计算机工程, 2023, 49(9): 60-68, 78.
[2]	马娜, 温廷新, 贾旭, 李晓会. 复杂光照条件下自适应的车脸重识别模型[J]. 计算机工程, 2023, 49(8): 275-282, 290.
[3]	戴浩磊, 黄永慧, 周郭许. 基于超图正则化非负张量链分解的聚类分析[J]. 计算机工程, 2023, 49(6): 81-89.
[4]	宋羽凯, 谢江. 基于多任务学习的轻量级语音情感识别模型[J]. 计算机工程, 2023, 49(5): 122-128.
[5]	关日鹏, 况立群, 焦世超, 熊风光, 韩燮. 多模态特征融合与词嵌入驱动的三维检索方法[J]. 计算机工程, 2023, 49(4): 101-107,113.
[6]	李培育, 张雅丽. 基于改进SRGAN模型的人脸图像超分辨率重建[J]. 计算机工程, 2023, 49(4): 199-205.
[7]	李晓腾, 张盼盼, 勾智楠, 高凯. 基于多任务学习的多模态命名实体识别方法[J]. 计算机工程, 2023, 49(4): 114-119.
[8]	耿磊, 傅洪亮, 陶华伟, 卢远, 郭歆莹, 赵力. 基于动态卷积递归神经网络的语音情感识别[J]. 计算机工程, 2023, 49(4): 125-130,137.
[9]	饶东宁, 罗南岳. 基于多任务强化学习的堆垛机调度与库位推荐[J]. 计算机工程, 2023, 49(2): 279-287,295.
[10]	何悦, 陈广胜, 景维鹏, 徐泽堃. 基于深度多相似性哈希方法的遥感图像检索[J]. 计算机工程, 2023, 49(2): 206-212.
[11]	高庆吉, 李天昊, 邢志伟, 刘佩佩. 基于区块特征融合的点云语义分割方法[J]. 计算机工程, 2022, 48(9): 37-44,54.
[12]	闫静, 张雪英, 李凤莲, 陈桂军, 黄丽霞. 结合栈式监督AE与可变加权ELM的回归预测模型[J]. 计算机工程, 2022, 48(8): 62-69,76.
[13]	李晨, 侯进, 李金彪, 陈子锐. 基于注意力与残差级联的红外与可见光图像融合方法[J]. 计算机工程, 2022, 48(7): 234-240.
[14]	张会云, 黄鹤鸣. 基于多操作网络的图式多域语音情感识别研究[J]. 计算机工程, 2022, 48(7): 59-65.
[15]	崔云轩, 刘桂华, 余东应, 郭中远, 张文凯. 点线特征融合的激光雷达单目惯导SLAM系统[J]. 计算机工程, 2022, 48(7): 254-263.

选择文件类型/文献管理软件名称

选择包含的内容