作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

生成式补全与动态知识融合的多模态情感分析

  • 发布日期:2025-08-13

Multimodal Sentiment Analysis via Generative Completion and Dynamic Knowledge Fusion

  • Published:2025-08-13

摘要: 多模态情感分析利用多模态数据来推断人类情感。然而现有模型在应对模态信息缺失、文本依赖及跨模态冲突等情况下性能下降明显。为此提出一种基于生成式补全与动态知识融合的多模态情感分析模型(Generative Completion and Dynamic Knowledge Fusion Model,GC-DKF)。首先模型通过生成式提示学习模块,对原始数据中缺失的模态内及模态间信息进行补全,生成缺失的模态特征,提升模型对不确定模态场景的适应能力。然后设计一种主导模态动态选择机制,依据情感比例因子动态选定主导模态,同时引入知识编码器增强单一模态的表征能力,获取各模态的知识增强表征。最后在主导模态特征的引导下,进一步学习其他次要模态,生成互补性的多模态融合联合表征,实现更为高效、精准的多模态情感分析。在公开的CMU-MOSI和CMU-MOSEI数据集上的实验验证显示,所提模型在二分类准确率、F1分数、平均绝对误差和Pearson相关系数等指标上,均超越现有的主流多模态情感识别方法,情感识别准确率分别高达83.55%和83.02%。这充分证明提出的模型在多模态情感识别任务中具备较强竞争力。

Abstract: Multimodal sentiment analysis utilizes multimodal data to infer human emotions. However, existing models significantly degrade in performance when faced with issues such as modal information loss, text dependency, and cross-modal conflicts. To address this, a Generative Completion and Dynamic Knowledge Fusion Model (GC-DKF) for multimodal sentiment analysis is proposed. First, the model employs a generative prompt learning module to complete the missing intra-modal and inter-modal information in the original data, generating missing modal features to enhance the model's adaptability to uncertain modal scenarios. Then, a dynamic dominant modality selection mechanism is designed to dynamically select the dominant modality based on emotional proportion factors. Meanwhile, a knowledge encoder is introduced to strengthen the representation capability of a single modality, obtaining knowledge-enhanced representations of each modality. Finally, guided by the features of the dominant modality, the model further learns other secondary modalities to generate complementary multimodal fusion joint representations, achieving more efficient and accurate multimodal sentiment analysis. Experiments on the public CMU-MOSI and CMU-MOSEI datasets demonstrate that the proposed model outperforms existing mainstream multimodal sentiment recognition methods in terms of metrics such as binary classification accuracy, F1 score, mean absolute error, and Pearson correlation coefficient, with sentiment recognition accuracies reaching as high as 83.55% and 83.02%, respectively. This fully demonstrates that the proposed model has strong competitiveness in multimodal sentiment recognition tasks.