计算机工程 ›› 2020, Vol. 46 ›› Issue (6): 65-72.doi: 10.19678/j.issn.1000-3428.0054127

• 人工智能与模式识别 • 上一篇    下一篇

基于层次注意力机制的维度情感识别方法

汤宇豪, 毛启容, 高利剑   

  1. 江苏大学 计算机科学与通信工程学院, 江苏 镇江 212013
  • 收稿日期:2019-03-07 修回日期:2019-04-20 发布日期:2019-05-29
  • 作者简介:汤宇豪(1994-),男,硕士研究生,主研方向为多模态情感识别;毛启容,教授;高利剑,硕士研究生。
  • 基金项目:
    国家自然科学基金(61672267,61672268)。

Dimensional Emotion Recognition Method Based on Hierarchical Attention Mechanism

TANG Yuhao, MAO Qirong, GAO Lijian   

  1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China
  • Received:2019-03-07 Revised:2019-04-20 Published:2019-05-29

摘要: 在连续维度情感识别任务中,每个模态内部凸显情感表达的部分并不相同,不同模态对于情感状态的影响程度也有差别。为此,通过学习各个模态特征并采用合理的融合方式,提出一种基于层次注意力机制的多模态维度情感识别模型。在音频模态中加入频率注意力机制学习频域上下文信息,利用多模态注意力机制将视频特征与音频特征进行融合,依据改进的损失函数对模态缺失问题进行优化,提高模型的鲁棒性以及情感识别的性能。在公开数据集上的实验结果表明,相比于卷积神经网络和长短时记忆网络等方法,该模型一致性相关系数指标明显提升,并且识别效率更高,可适用于大批量数据的维度情感识别。

关键词: 多模态, 连续维度情感识别, 注意力机制, 特征融合, 深度学习

Abstract: In continuous dimensional emotion recognition,the part of highlighting emotional expression varies in each modality,and different modalities also have different influence on emotional states.To address the problem,by learning modal features and fusing them in a reasonable way,this paper proposes a multimodal dimensional emotion recognition model based on Hierarchical Attention Mechanism(HAM).Frequency attention mechanism is added to the audio modality to learn the context information in frequency domain,and the video features are fused with the audio features by using the multimodal attention mechanism.Then the problem of missing modalities is relieved by using the improved loss function to improve the robustness and emotion recognition performance.Experimental results on public datasets show that compared with methods such as Convolutional Neural Network(CNN) and Long Short Term Memory(LSTM) networks,this method has improved the Concordance Correlation Coefficient(CCC) index,and has higher recognition efficiency.It is applicable to dimensional emotion recognition of large volumes of data.

Key words: multimodality, continuous dimensional emotion recognition, attention mechanism, feature fusion, deep learning

中图分类号: