作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于私有特征学习与对比学习的多模态情感分析

  • 发布日期:2025-07-31

Multimodal Sentiment Analysis with Private Feature Learning and Contrastive Learning

  • Published:2025-07-31

摘要: 在多模态情感分析任务中,传统方法依赖于直接融合多模态信息,而每个模态特有的私有特征往往被跨模态交互所忽略,这可能导致模型在处理复杂情感表达时的准确性和鲁棒性不足。特别是在智慧教育场景中,教师需要通过学生的语音、表情和文本反馈来精准判断其学习状态和情绪波动,因此,提升多模态情感分析的精度对于个性化教学和课堂交互具有重要意义。为了解决这一问题,本研究提出了一种结合私有特征学习和对比学习的情感分析模型。首先,为了充分利用私有特征,该模型将共享特征与原始的文本、音频和视觉特征进行相似性比较,从而识别在跨模态交互中被忽视的私有特征,再通过融合私有特征和共享特征来增强模型的表达能力。其次,提出了一种模态无关对比损失(Modality-Agnostic Contrastive Loss,MACL),该方法通过对多模态融合特征进行对比学习,有效利用多模态数据中的情感信息,减少模态间的差距,进而获得统一的情感表示。实验结果表明,在CMU-MOSI和CMU-MOSEI数据集上,该模型的F1值分别提升到了85.98%和85.95%,二分类准确率分别提升到了86.01%和85.97%,显著高于次优模型,验证了该模型的有效性。

Abstract: In multimodal sentiment analysis, traditional methods rely on directly fusing multimodal information, while modality-specific private features are often overlooked in cross-modal interactions. This may reduce accuracy and robustness in handling complex sentiment expressions. Particularly in smart education scenarios, teachers need to accurately assess students' learning states and emotional fluctuations by analyzing their speech, facial expressions, and textual feedback. Thus, enhancing the precision of multimodal sentiment analysis is crucial for personalized learning and classroom interaction.To address this issue, this study proposes a sentiment analysis model that integrates private feature learning and contrastive learning. First, to fully leverage private features, the model compares shared features with the original text, audio, and visual features to identify modality-specific information that is often overlooked in cross-modal interactions. The private features are then fused with the shared features to enhance the model’s expressive capability. Second, a Modality-Agnostic Contrastive Loss (MACL) is introduced to perform contrastive learning on the fused multimodal features, effectively capturing sentiment information from different modalities while mitigating cross-modal discrepancies to obtain a unified sentiment representation.Experimental results on the CMU-MOSI and CMU-MOSEI datasets demonstrate that the proposed model achieves F1 scores of 85.98% and 85.95%, with binary classification accuracy reaching 86.01% and 85.97%, respectively. These results demonstrate a significant improvement over state-of-the-art models, validating the effectiveness of the proposed approach.