Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Hybrid Encoding and Fuzzy Modeling for Multimodal Emotion Recognition in Conversations

  

  • Published:2026-05-15

融合混合编码与模糊建模的多模态对话情感识别模型

Abstract: Multimodal emotion recognition in conversations integrates language, acoustic, and visual information to automatically identify the emotions in dialogues, thereby enhancing the naturalness and emotional understanding in human-computer interaction. However, existing methods have limitations in modeling multi-layer contextual dependencies of emotions. Multimodal feature fusion often introduces redundant information and noise, and these methods cannot effectively capture the uncertainty of emotions, which limits the recognition of complex emotional categories. To address these issues, this paper proposes a multimodal emotion recognition model that combines hybrid encoding and fuzzy modeling. The model uses a hybrid encoding module to capture both global dialogue context and local utterance-level dependencies, which strengthens the representation of emotional temporal features. In addition, a hierarchical gated fusion mechanism integrates features from different modalities and layers with dynamic weighting to suppress redundancy and noise and improve multimodal feature discrimination. For emotion classification, a fuzzy neural network initialized with linearly spaced parameters models the boundaries of emotion categories using fuzzy membership functions, capturing the uncertainty and fuzziness of emotional expression. Experimental results show that the proposed model outperforms baseline methods on all metrics across the IEMOCAP, MELD, and CMU-MOSEI datasets. It achieves an accuracy of 72.67% on IEMOCAP, 67.37% on MELD, and 54.96% for 7-class accuracy and 86.78% for 2-class accuracy on CMU-MOSEI, respectively, which validates the effectiveness of the proposed method in multimodal sentiment analysis.

摘要: 多模态对话情感识别通过融合语言、声学和视觉等多源信息,实现对话情绪的自动识别,从而增强人机交互的自然性与情感理解。然而,现有方法在建模情感的多层上下文依赖方面仍存在不足,模态融合易引入冗余或噪声,且难以刻画情感的不确定性,限制复杂情绪识别。针对上述问题,提出了一种融合混合编码与模糊建模的多模态对话情感识别模型。该模型通过混合编码模块同时建模情感的全局对话上下文与局部依赖关系,从而增强情感时序特征的表达能力,并在此基础上引入分层门控融合机制,对不同层次和不同模态特征进行动态加权融合,以有效抑制冗余信息与噪声干扰。在情感分类阶段,采用线性等间距初始化的模糊神经网络,通过模糊隶属函数对情感类别边界进行建模,以刻画情绪表达中的不确定性与模糊性。实验结果显示,该模型在 IEMOCAP、MELD 和 CMU-MOSEI 三个数据集上的各项指标均优于基线方法,在 IEMOCAP 上准确率达到 72.67%,MELD 上为 67.37%,CMU-MOSEI 七分类与二分类准确率分别为 54.96% 和 86.78%,验证了所提方法在多模态情感分析中的有效性。