Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (3): 222-233. doi: 10.19678/j.issn.1000-3428.0070147

• Multimodal Information Fusion • Previous Articles     Next Articles

Target-Entity Sentiment Classification with Image-Text Multimodal Entity Alignment

ZHANG Tianzhi1, ZHOU Gang1,*(), ZHANG Shuang2, CHEN Jing1, HUANG Ningbo1, WU Hao1   

  1. 1. School of Data and Target Engineering, The PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, Henan, China
    2. Information Engineering Department, Liaoning Provincial College of Communications, Shenyang 110122, Liaoning, China
  • Received:2024-07-19 Revised:2024-09-13 Online:2026-03-15 Published:2024-12-11
  • Contact: ZHOU Gang

针对图文模态间实体对齐的目标实体情感分类

张添植1, 周刚1,*(), 张爽2, 陈静1, 黄宁博1, 吴皓1   

  1. 1. 中国人民解放军战略支援部队信息工程大学数据与目标工程学院, 河南 郑州 450001
    2. 辽宁省交通高等专科学校信息工程系, 辽宁 沈阳 110122
  • 通讯作者: 周刚
  • 作者简介:

    张添植, 男, 硕士研究生, 主研方向为情感分类、数据挖掘

    周刚(通信作者), 教授、博士

    张爽, 副教授、硕士

    陈静, 博士研究生

    黄宁博, 博士研究生

    吴皓, 硕士研究生

  • 基金资助:
    河南省科技攻关项目(222102210081)

Abstract:

With the increasing popularity of social media, Multimodal Sentiment Classification (MSC) has received widespread attention in recent years. Target-oriented Multimodal Sentiment Classification (TMSC) is an important task in the field of multimodal sentiment analysis, which aims to predict the sentiment polarity of a referred entity by combining multiple modal information, such as text and images. Although many scholars have proposed numerous modeling methods for this task, these methods are still unable to achieve accurate entity alignment between text and images, which directly affects model accuracy on a target task. To address this problem, this study proposes a model for target-entity sentiment classification with Image-Text Multimodal Entity Alignment (ITMEA). The model first adopts Adjective-Noun Pairs (ANPs) extracted from an image to design sentiment auxiliary information such that the key sentiment information of the target entity in an image can be expressed more intuitively. Simultaneously, feature description information is designed by adopting the multimodal Large Language Model (LLM), LLaMA-Adapter V2, achieving accurate intermodal target entity alignment. Moreover, the model constructs a gating mechanism in the intermodal feature fusion stage to prevent irrelevant information from introducing additional interference, by dynamically controlling the input of information other than text. Experimental results on two Twitter benchmark datasets, Twitter-2015 and Twitter-2017, show that ITMEA improves accuracy by approximately 1.00 and 0.57 percentage points, respectively, in comparison with the optimal method among compared baselines, thus validating the effectiveness and superiority of the methods designed in this study.

Key words: Multimodal Sentiment Classification (MSC), target-entity sentiment classification, Adjective-Noun Pairs (ANPs), multimodal Large Language Model (LLM), gating mechanism

摘要:

随着社交媒体在人们生活中的普及, 多模态情感分类(MSC)研究近年来受到了广泛的关注。多模态目标实体情感分类(TMSC)是MSC研究领域的一项重要任务, 旨在结合文本和图像等多种模态信息预测所指代实体的情感极性。尽管当前已有众多学者针对该任务提出了一系列的建模方法, 但是这些方法还无法做到文本和图像模态间实体的精准对齐, 从而直接影响了模型在目标任务上的准确性。为了解决这一问题, 提出针对图文模态间实体对齐的目标实体情感分类模型(ITMEA)。该模型采用从图像中所提取的形容词-名词对(ANPs)设计情感辅助信息, 使得图像中目标实体的关键情感信息获得更直观的表达, 同时也采用多模态大语言模型(LLM)LLaMA-Adapter V2设计了特征描述信息, 进一步实现模态间目标实体的精准对齐。此外, 模型在模态间特征融合阶段构建一种门控机制, 通过动态控制文本以外信息的输入防止与文本语义不相关的信息引入额外干扰。在Twitter基准数据集Twitter-2015和Twitter-2017上的实验结果表明, ITMEA模型相较于对比基线中的最优方法准确率分别提升了约1.00和0.57百分点, 验证了所提方法的有效性和优越性。

关键词: 多模态情感分类, 目标实体情感分类, 形容词-名词对, 多模态大语言模型, 门控机制