Target-Entity Sentiment Classification with Image-Text Multimodal Entity Alignment

doi:10.19678/j.issn.1000-3428.0070147

Abstract

Abstract:

With the increasing popularity of social media, Multimodal Sentiment Classification (MSC) has received widespread attention in recent years. Target-oriented Multimodal Sentiment Classification (TMSC) is an important task in the field of multimodal sentiment analysis, which aims to predict the sentiment polarity of a referred entity by combining multiple modal information, such as text and images. Although many scholars have proposed numerous modeling methods for this task, these methods are still unable to achieve accurate entity alignment between text and images, which directly affects model accuracy on a target task. To address this problem, this study proposes a model for target-entity sentiment classification with Image-Text Multimodal Entity Alignment (ITMEA). The model first adopts Adjective-Noun Pairs (ANPs) extracted from an image to design sentiment auxiliary information such that the key sentiment information of the target entity in an image can be expressed more intuitively. Simultaneously, feature description information is designed by adopting the multimodal Large Language Model (LLM), LLaMA-Adapter V2, achieving accurate intermodal target entity alignment. Moreover, the model constructs a gating mechanism in the intermodal feature fusion stage to prevent irrelevant information from introducing additional interference, by dynamically controlling the input of information other than text. Experimental results on two Twitter benchmark datasets, Twitter-2015 and Twitter-2017, show that ITMEA improves accuracy by approximately 1.00 and 0.57 percentage points, respectively, in comparison with the optimal method among compared baselines, thus validating the effectiveness and superiority of the methods designed in this study.

Key words: Multimodal Sentiment Classification (MSC), target-entity sentiment classification, Adjective-Noun Pairs (ANPs), multimodal Large Language Model (LLM), gating mechanism

摘要：

随着社交媒体在人们生活中的普及, 多模态情感分类(MSC)研究近年来受到了广泛的关注。多模态目标实体情感分类(TMSC)是MSC研究领域的一项重要任务, 旨在结合文本和图像等多种模态信息预测所指代实体的情感极性。尽管当前已有众多学者针对该任务提出了一系列的建模方法, 但是这些方法还无法做到文本和图像模态间实体的精准对齐, 从而直接影响了模型在目标任务上的准确性。为了解决这一问题, 提出针对图文模态间实体对齐的目标实体情感分类模型(ITMEA)。该模型采用从图像中所提取的形容词-名词对(ANPs)设计情感辅助信息, 使得图像中目标实体的关键情感信息获得更直观的表达, 同时也采用多模态大语言模型(LLM)LLaMA-Adapter V2设计了特征描述信息, 进一步实现模态间目标实体的精准对齐。此外, 模型在模态间特征融合阶段构建一种门控机制, 通过动态控制文本以外信息的输入防止与文本语义不相关的信息引入额外干扰。在Twitter基准数据集Twitter-2015和Twitter-2017上的实验结果表明, ITMEA模型相较于对比基线中的最优方法准确率分别提升了约1.00和0.57百分点, 验证了所提方法的有效性和优越性。

关键词: 多模态情感分类, 目标实体情感分类, 形容词-名词对, 多模态大语言模型, 门控机制

ZHANG Tianzhi, ZHOU Gang, ZHANG Shuang, CHEN Jing, HUANG Ningbo, WU Hao. Target-Entity Sentiment Classification with Image-Text Multimodal Entity Alignment[J]. Computer Engineering, 2026, 52(3): 222-233.

张添植, 周刚, 张爽, 陈静, 黄宁博, 吴皓. 针对图文模态间实体对齐的目标实体情感分类[J]. 计算机工程, 2026, 52(3): 222-233.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070147

https://www.ecice06.com/EN/Y2026/V52/I3/222

Figures/Tables 10

Fig.1 Overview architecture of ITMEA model

Fig.2 Generation process of sentiment auxiliary information

Fig.3 Generation process of feature description information

Fig.4 Effect of k value on model performance

References 39

1	ZHANG L , WANG S , LIU B . Deep learning for sentiment analysis: a survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018, 8(4): e1253. doi: 10.1002/widm.1253
2	武星, 殷浩宇, 姚骏峰, 等. 面向视频数据的多模态情感分析. 计算机工程, 2024, 50(6): 218- 227. doi: 10.19678/j.issn.1000-3428.0067874
	WU X , YIN H Y , YAO J F , et al. Multimodal sentiment analysis for video data. Computer Engineering, 2024, 50(6): 218- 227. doi: 10.19678/j.issn.1000-3428.0067874
3	XU N, MAO W J, CHEN G D. Multi-interactive memory network for aspect based multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2019: 371-378.
4	YU J F , JIANG J , XIA R . Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. ACM Transactions on Audio, Speech, and Language Processing, 2019, 28, 429- 439.
5	YU J F, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: International Joint Conferences on Artificial Intelligence Organization, 2019: 5408-5414.
6	CHEN Y H. Convolutional neural network for sentence classification[D]. Waterloo, Canada: University of Waterloo, 2015.
7	LI D, QIAN J. Text sentiment analysis based on long short-term memory[C]//Proceedings of the 1st IEEE International Conference on Computer Communication and the Internet (ICCCI). Wuhan, China: IEEE Press, 2016: 471-475.
8	SHIN B, LEE T, CHOI J D. Lexicon integrated CNN models with attention for sentiment analysis[C]//Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Stroudsburg, USA: ACL, 2017: 149-158.
9	YOU Q Z, JIN H L, LUO J B. Visual sentiment analysis by attending on local image regions[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. [S.l.]: AAAI Press, 2017: 231-237.
10	LI Z H , FAN Y Y , LIU W H , et al. Image sentiment prediction based on textual descriptions with adjective noun pairs. Multimedia Tools and Applications, 2018, 77(1): 1115- 1132. doi: 10.1007/s11042-016-4310-5
11	WU L F , QI M C , JIAN M , et al. Visual sentiment analysis by combining global and local information. Neural Processing Letters, 2020, 51(3): 2063- 2075. doi: 10.1007/s11063-019-10027-7
12	CAMBRIA E . Affective computing and sentiment analysis. IEEE Intelligent Systems, 2016, 31(2): 102- 107.
13	刘晓明, 李丞正旭, 吴少聪, 等. 文本分类算法及其应用场景研究综述. 计算机学报, 2024, 47(6): 1244- 1287.
	LIU X M , LI C Z X , WU S C , et al. A survey of text classification algorithms and application scenarios. Chinese Journal of Computers, 2024, 47(6): 1244- 1287.
14	KIRITCHENKO S, ZHU X D, CHERRY C, et al. NRC-Canada-2014: detecting aspects and sentiment in customer reviews[C]//Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Stroudsburg, USA: ACL, 2014: 437-442.
15	VO D T, ZHANG Y. Target-dependent Twitter sentiment classification with rich automatic features[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence. Washington D. C., USA: IEEE Press, 2015: 1347-1353.
16	DENG L J, WIEBE J. Joint prediction for entity/event-level sentiment analysis using probabilistic soft logic models[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2015: 179-189.
17	DONG L, WEI F R, TAN C Q, et al. Adaptive recursive neural network for target-dependent Twitter sentiment classification[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, USA: ACL, 2014: 49-54.
18	TANG D Y, QIN B, FENG X C, et al. Effective LSTMs for target-dependent sentiment classification[C]// Proceedings of the International Conference on Computational Linguistics: Technical Papers (COLING). Washington D. C., USA: IEEE Press, 2016: 3298-3307.
19	XUE W, LI T. Aspect based sentiment analysis with gated convolutional networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL, 2018: 2514-2523.
20	FAN F F, FENG Y S, ZHAO D Y. Multi-grained attention network for aspect-level sentiment classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2018: 3433-3442.
21	吴海鹏, 钱育蓉, 冷洪勇. 基于双向注意力机制的多模态关系抽取. 计算机工程, 2024, 50(4): 160- 167. doi: 10.19678/j.issn.1000-3428.0067700
	WU H P , QIAN Y R , LENG H Y . Multimodal relation extraction based on bidirectional attention mechanism. Computer Engineering, 2024, 50(4): 160- 167. doi: 10.19678/j.issn.1000-3428.0067700
22	PORIA S , HAZARIKA D , MAJUMDER N , et al. Beneath the tip of the iceberg: current challenges and new directions in sentiment analysis research. IEEE Transactions on Affective Computing, 2023, 14(1): 108- 132. doi: 10.1109/TAFFC.2020.3038167
23	PORIA S, CAMBRIA E, GELBUKH A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2015: 2539-2544.
24	BERTERO D, SIDDIQUE F B, WU C S, et al. Real-time speech emotion and sentiment recognition for Interactive Dialogue systems[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2016: 1042-1047.
25	PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL, 2017: 873-883.
26	YANG J F , SHE D Y , SUN M , et al. Visual sentiment prediction based on automatic discovery of affective regions. IEEE Transactions on Multimedia, 2018, 20(9): 2513- 2525. doi: 10.1109/TMM.2018.2803520
27	KUMAR A , GARG G . Sentiment analysis of multimodal twitter data. Multimedia Tools and Applications, 2019, 78(17): 24103- 24119. doi: 10.1007/s11042-019-7390-1
28	KUMAR A , SRINIVASAN K , CHENG W H , et al. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Information Processing & Management, 2020, 57(1): 102141.
29	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT. Washington D. C., USA: IEEE Press, 2019: 4171-4186.
30	KHAN Z, FU Y. Exploiting BERT for multimodal target sentiment classification through input space translation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 3034-3042.
31	YU J F, WANG J M, XIA R, et al. Targeted multimodal sentiment classification based on coarse-to-fine grained image-target matching[C]//Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna, Austria: International Joint Conferences on Artificial Intelligence Organization, 2022: 4482-4488.
32	LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized bert pretraining approach[EB/OL]. [2024-09-12]. https://arxiv.org/abs/1907.11692.
33	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE Press, 2016: 770-778.
34	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM, 2017: 6000-6010.
35	BORTH D, JI R R, CHEN T, et al. Large-scale visual sentiment ontology and detectors using adjective noun pairs[C]//Proceedings of the 21st ACM International Conference on Multimedia. New York, USA: ACM, 2013: 223-232.
36	CHEN T, BORTH D, DARRELL T, et al. DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks[EB/OL]. [2024-09-12]. https://arxiv.org/abs/1410.8586.
37	GAO P, HAN J, ZHANG R, et al. LLaMA-Adapter V2: parameter-efficient visual instruction model[EB/OL]. [2024-09-12]. https://arxiv.org/abs/2304.15010.
38	LIU H, LI C, WU Q, et al. Visual instruction tuning[EB/OL]. [2024-09-12]. https://arxiv.org/abs/2304.08485.
39	DAI W, LI J, LI D, et al. InstructBLIP: towards general-purpose vision-language models with instruction tuning[EB/OL]. [2024-09-12]. https://arxiv.org/abs/2305.06500.

[1]	Wei LIU, Lei MA, Kai LI, Rong LI. Chinese Medical Named Entity Recognition Based on Multi-Granularity Glyph Enhancement [J]. Computer Engineering, 2024, 50(2): 337-344.
[2]	Yanxia GUO, Yong JIN, Hong TANG, Jinzhi PENG. Multi-modal Emotion Recognition Based on Dynamic Convolution and Residual Gating [J]. Computer Engineering, 2023, 49(7): 94-101.

Please choose a citation manager

Content to export