作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (9): 80-90. doi: 10.19678/j.issn.1000-3428.0069705

• 人工智能与模式识别 • 上一篇    下一篇

结合局部感知与多层次注意力的多模态方面级情感分析

曾碧卿1,2,*(), 姚勇涛1, 谢梁琦1, 陈鹏飞1, 邓会敏3, 王瑞棠2   

  1. 1. 华南师范大学软件学院, 广东 佛山 528225
    2. 华南师范大学阿伯丁数据科学与人工智能学院, 广东 佛山 528225
    3. 广东农工商职业技术学院计算机学院, 广东 广州 510507
  • 收稿日期:2024-04-07 修回日期:2024-05-16 出版日期:2025-09-15 发布日期:2024-06-25
  • 通讯作者: 曾碧卿
  • 基金资助:
    广东省普通高校人工智能重点领域专项(2019KZDZX1033); 广东省基础与应用基础研究基金(2021A1515011171); 广州市基础研究计划基础与应用基础研究项目(202102080282)

Multimodal Aspect-Based Sentiment Analysis Combining Local Perception and Multi-Level Attention

ZENG Biqing1,2,*(), YAO Yongtao1, XIE Liangqi1, CHEN Pengfei1, DENG Huimin3, WANG Ruitang2   

  1. 1. School of Software, South China Normal University, Foshan 528225, Guangdong, China
    2. Aberdeen Institute of Data Science and Artificial Intelligence, South China Normal University, Foshan 528225, Guangdong, China
    3. School of Computing Science, Guangdong Agriculture Industry Business Polytechnic, Guangzhou 510507, Guangdong, China
  • Received:2024-04-07 Revised:2024-05-16 Online:2025-09-15 Published:2024-06-25
  • Contact: ZENG Biqing

摘要:

多模态方面级情感分析(MABSA)旨在从图文对中分析方面词的情感极性。现有方法致力于抽取图像和文本的情感特征。然而,图像和文本的各个特征不一定对最终的情感分析是有效的,图像和文本通常在方面词情感相关的区域外含有大量的冗余信息与噪声信息,并且图像和文本的不同区域可能对应不同方面词,导致在构建图像和文本特征抽取的初步阶段引入噪声。此外,图像和文本的方面词相关的情感极性可能是对立的,即两者存在交互信息。为了解决上述问题,提出结合局部感知与多层次注意力的MABSA模型。首先,设计局部感知模块,筛选与方面词语义相关的文本内容及图像区域;然后,引入多层次注意力模块,使用瓶颈注意力机制进行模态交互信息的提取,提高了情感信息的聚合准确率。实验结果表明,该模型能够在Twitter2015、Twitter2017、Multi-ZOL数据集上达到SOTA(State-of-the-Art)性能,显著优于同类模型。

关键词: 多模态方面级情感分析, 局部感知, 多层次注意力, 局部上下文, 瓶颈注意力

Abstract:

Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to analyze the sentiment polarity of aspect words derived from text and image pairs. Existing methods primarily focus on extracting emotional features from both images and texts. However, the various features of images and texts may not necessarily be effective for the final sentiment analysis. Both images and text often contain a large amount of redundant and noisy information outside the areas related to aspect words, and different regions of images and text may be related to different aspect words. In the process of approximately establishing image text feature extraction, noise is introduced into multimodal aspect-level sentiment analysis tasks. In addition, the sentiment polarity related to aspect words in images and text may still be the opposite, implying an interactive information between the two. To address these issues, this paper proposes a multimodal aspect-level sentiment analysis model that combines local perception and multilevel attention. Specifically, the local perception module is designed to simultaneously select text content and image regions that are semantically relevant to aspect words. Subsequently, to improve the accuracy of sentiment aggregation, a multilevel attention module is introduced into the model, which uses a bottleneck attention mechanism to extract modal interaction information. The experimental results show that the model achieves State-Of-The-Art (SOTA) performance on the Twitter2015, Twitter2017, and Multi-ZOL datasets, significantly outperforms similar models.

Key words: Multimodal Aspect-Based Sentiment Analysis (MABSA), local perception, multi-level attention, local context, bottleneck attention