作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 89-96, 104. doi: 10.19678/j.issn.1000-3428.0065977

• 人工智能与模式识别 • 上一篇    下一篇

面向网络舆情分析的多任务学习策略时间卷积网络

张会云1,2, 黄鹤鸣1,2,*   

  1. 1. 青海师范大学 计算机学院, 西宁 810008
    2. 藏语智能信息处理及应用国家重点实验室 西宁 810008
  • 收稿日期:2022-10-12 出版日期:2023-10-15 发布日期:2023-01-12
  • 通讯作者: 黄鹤鸣
  • 作者简介:

    张会云(1993—),女,博士研究生,主研方向为模式识别、智能系统、语音情感识别

  • 基金资助:
    国家自然科学基金(62066039); 青海省自然科学基金(2022-ZJ-925)

Diplomatic Temporal Convolutional Network with Multi-Task Learning for Network Public Opinion Analysis

Huiyun ZHANG1,2, Heming HUANG1,2,*   

  1. 1. College of Computer, Qinghai Normal University, Xining 810008, China
    2. The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining 810008, China
  • Received:2022-10-12 Online:2023-10-15 Published:2023-01-12
  • Contact: Heming HUANG

摘要:

检测与识别网络中语音的情感状态有助于把控舆情信息,若能同时辨别说话人及其性别,则对掌握舆情的真实意图更有帮助。基于数据集EMODB,提出用于情感分类、说话人辨别和性别识别的多任务学习策略时间卷积网络(DTCN)。针对多任务学习中数据集较小的问题,设计数据增强技术,在不同信噪比下采用加噪的方式对数据集EMODB进行扩充,构建单信噪比含噪数据集EMODB-10、EMODB-5、EMODB0、EMODB5、EMODB10以及多信噪比含噪数据集EMODBM。同时,通过研究单一噪声和混合噪声,验证不同噪声对DTCN模型性能的影响。为了更好地表征数据特性,提出适用于多任务学习的声学特征集。实验结果表明,在具有正信噪比和多信噪比含噪数据集上进行测试时,DTCN模型在多任务学习场景下的表现均优于基线,较容易辨别说话人性别,且随着噪声种类增多,对多任务学习的性能不断提高,在混合噪声下鲁棒性和泛化性更好。

关键词: 语音情感识别, 策略时间卷积网络, 多任务学习, 数据扩充, 特征提取

Abstract:

In the analysis of online speech opinions, it is important to understand the underlying significance of a speech. To gain a deeper insight into the true intentions of public opinion, the extraction of additional information such as the emotional context of the speech and gender of the speaker can be highly beneficial. First, a novel Diplomatic Temporal Convolutional Network(DTCN) is designed for Multi-Task Learning(MTL), specifically for tasks including emotion classification, speaker recognition, and gender recognition. Second, a data augmentation technique allows diversifying the EMODB dataset by incorporating various Signal-to-Noise Ratio (SNR). This results in the creation of individual noisy datasets labeled as EMODB-10, EMODB-5, EMODB0, EMODB5, EMODB10, as well as a comprehensive multi-SNR dataset known as EMODBM. Simultaneously, a comprehensive study was conducted on both single and hybrid noisy datasets to assess how different noise levels impact the performance of the DTCN model. Additionally, an acoustic feature fusion technique enhances data representation. Experimental results indicate that the DTCN model performs better in MTL scenarios compared to the baseline when tested on noisy datasets created with positive SNRs and multi-SNR datasets. Moreover, it demonstrates a high accuracy in speaker gender recognition. Lastly, given that a greater diversity of noise types is introduced, the performance of the MTL model continues to improve, achieving enhanced robustness and generalization, particularly when confronted with hybrid noise scenarios.

Key words: speech emotion recognition, Diplomatic Temporal Convolutional Network(DTCN), Multi-Task Learning(MTL), data augmentation, feature extraction