作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

面向视频数据的多模态情感分析

  • 发布日期:2023-10-30

Multimodal sentiment analysis for videos

  • Published:2023-10-30

摘要: 多模态情感分析旨在从文本、图像和音频数据中提取和整合语义信息,从而识别在线视频中说话者的情感状态。尽管多模态融合方案在此研究领域的应用已取得一定成果,但先前的研究在处理模态间分布差异和关系知识的融合上仍有所欠缺。本文提出一种创新的多模态提示门模块,其能够将非语言信息转换为融合文本上下文的提示,利用文本信息对非语言信号的噪声进行过滤,得到包含丰富语义信息的提示,以增强模态间的信息整合。此外,本文提出了一种实例到标签的对比学习框架,在语义层面上区分隐空间里的不同标签以便进一步优化模型输出。通过在三个大规模情感分析数据集上的实验结果表明,本文的方法在中英文数据集以及不同的评估指标上都达到了最先进的性能。本文方法的二分类精度提高了约0.7%,三分类精度提高了超过2.5%,达到67.1%。本文的工作有助于将多模态情感分析引入到许多不同领域,如用户画像、视频理解、AI面试等。未来,这项工作还可以促进社交媒体用户情感的研究,为继续挖掘社交用户情感提供宝贵的经验。

Abstract: Multimodal sentiment analysis aims to extract and integrate semantic information from text, image and audio data to identify the emotional states of speakers in online videos. Although the application of multimodal fusion scheme in this research field has made some achievements, the previous research is still lacking in dealing with the distribution difference between modes and the fusion of relational knowledge. In this paper, an innovative multimodal prompt gate module is proposed, which can transform non-verbal information into a prompt that incorporates text context and use text information to filter the noise of non-verbal signals to obtain a prompt containing rich semantic information. Enhanced information integration between modes. In addition, a context-to-label contrast learning framework is proposed to distinguish different labels in hidden space in order to optimize model output. Experimental results on three large-scale sentiment analysis datasets show that the proposed method achieves state-of-the-art performance on both Chinese and English datasets as well as different evaluation indicators. The binary classification accuracy of this method is increased by about 0.7%, and the three-classification accuracy is increased by more than 2.5% to 67.1%. The work in this paper helps to introduce multimodal sentiment analysis to many different fields, such as user profiling, video understanding, AI interviewing, etc. In the future, our work can promote the research of social media users' emotions and provide valuable experience for further mining social users' emotions.