Multimodal Relation Extraction Based on Bidirectional Attention Mechanism

doi:10.19678/j.issn.1000-3428.0067700

Abstract

Abstract:

Conventional relation extraction methods identify the relationships between pairs of entities from plain text, whereas multimodal relation extraction methods enhance relation extraction by leveraging information from multiple modalities. To address the issue of existing multimodal relation extraction models being easily disturbed by redundant information when processing image data, this study proposes a multimodal relation extraction model based on a bidirectional attention mechanism. First, Bidirectional Encoder Representations from Transformers(BERT) and a scene graph-generation model are used to extract textual and visual semantic features, respectively. Subsequently, a bidirectional attention mechanism is employed to establish bidirectional alignment between images and text, and from text to images, thus facilitating bidirectional information exchange. This mechanism assigns lower weights to redundant information in images, thereby reducing interference to the semantic representation of text and mitigating the adverse effect of redundant information on the result of relation extraction. Finally, the aligned textual and visual feature representations are concatenated to form integrated text and image features. A Multi-Layer Perceptron(MLP) is used to calculate the probability scores for all relation classifications and output the predicted relations. Experimental results on a Multimodal dataset for Neural Relation Extraction(MNRE) show that the model achieves precision, recall, and F1 scores of 65.53%, 69.21%, and 67.32%, respectively, which are significantly higher than those of baseline models, thus demonstrating its effective improvement in relation extraction.

Key words: relation extraction, social network, redundant information, multimodal data, bidirectional attention mechanism

摘要：

传统关系抽取方法从纯文本中识别实体对之间的关系, 多模态关系抽取方法通过利用多种模态信息辅助关系抽取任务。针对现有多模态关系抽取模型在处理图像数据时存在容易受到冗余信息干扰的问题, 提出一种基于双向注意力机制的多模态关系抽取模型。首先, 采用来自Transformer的双向编码器表示(BERT)与场景图生成模型分别提取文本语义特征与图像语义特征。然后, 利用双向注意力机制建立图像到文本与文本到图像的双向对齐机制, 通过这种双向对齐机制实现图像与文本之间的双向信息交互, 赋予图像中冗余信息较低的权重以削弱其对文本语义表示的干扰, 从而减轻图像中冗余信息对关系抽取结果造成的负面影响。最后, 将对齐后的文本特征表示与视觉特征表示相连接形成文本与图像的融合特征, 通过多层感知机(MLP)计算所有关系分类的概率分数并输出预测关系。在用于神经关系提取的多模式数据集(MNRE)上的实验结果表明, 该模型的精确率、召回率、F1值分别达到65.53%、69.21%与67.32%, 相比于基准模型均有明显提升, 具有较好的关系抽取效果。

关键词: 关系抽取, 社交网络, 冗余信息, 多模态数据, 双向注意力机制

Haipeng WU, Yurong QIAN, Hongyong LENG. Multimodal Relation Extraction Based on Bidirectional Attention Mechanism[J]. Computer Engineering, 2024, 50(4): 160-167.

吴海鹏, 钱育蓉, 冷洪勇. 基于双向注意力机制的多模态关系抽取[J]. 计算机工程, 2024, 50(4): 160-167.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0067700

http://www.ecice06.com/EN/Y2024/V50/I4/160

Figures/Tables 4

References 25

1	ZHENG C M, WU Z W, FENG J H, et al. MNRE: a challenge multimodal dataset for neural relation extraction with visual evidence in social media posts[C]//Proceedings of IEEE International Conference on Multimedia and Expo. Washington D. C., USA: IEEE Press, 2021: 1-6.
2	CHEN X J, JIA S B, XIANG Y. A review: knowledge reasoning over knowledge graph. Expert Systems with Applications, 2020, 141, 112948. doi: 10.1016/j.eswa.2019.112948
3	ZELENKO D, AONE C, RICHARDELLA A. Kernel methods for relation extraction. Journal of machine learning research, 2003, 3, 1083- 1106.
4	ZENG D J, LIU K, LAI S W, et al. Relation classification via convolutional deep neural network[C]//Proceedings of the 25th International Conference on Computational Linguistics. Philadelphia, USA: ACL Press, 2014: 2335-2344.
5	WANG L L, CAO Z, DE MELO G, et al. Relation classification via multi-level attention CNNs[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2016: 1298-1307.
6	ZHANG Y H, ZHONG V, CHEN D Q, et al. Position-aware attention and supervised data improve slot filling[C]//Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2017: 35-45.
7	ZENG D J, LIU K, CHEN Y B, et al. Distant supervision for relation extraction via piecewise convolutional neural networks[C]//Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2015: 1753-1762.
8	WEI Z P, SU J L, WANG Y, et al. A novel cascade binary tagging framework for relational triple extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 1476-1488.
9	SOARES L B, FITZGERALD N, LING J, et al. Matching the blanks: distributional similarity for relation learning[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2019: 2895-2905.
10	ZHENG C M, FENG J H, FU Z, et al. Multimodal relation extraction with efficient graph alignment[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM Press, 2021: 5298-5306.
11	BROWN G I. An error analysis of relation extraction in social media documents[C]//Proceedings of ACL 2011 Student Session. New York, USA: ACM Press, 2011: 64-68.
12	LIU Z G, CHEN X R. Research on relation extraction of named entity on social media in smart cities. Soft Computing, 2020, 24 (15): 11135- 11147. doi: 10.1007/s00500-020-04742-w
13	SEO M, KEMBHAVI A, FARHADI A, et al. Bidirectional attention flow for machine comprehension[EB/OL]. [2023-04-11]. https://arxiv.org/abs/1611.01603.
14	LI J B, MENG Y, WU Z Y, et al. NeuFA: neural network based end-to-end forced alignment with bidirectional attention mechanism[C]//Proceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2022: 8007-8011.
15	黄宏展, 蒙祖强. 基于双向注意力机制的多模态情感分类方法. 计算机工程与应用, 2021, 57 (11): 119- 127. URL
	HUANG H Z, MENG Z Q. Bidirectional attention mechanism based multimodal sentiment classification method. Computer Engineering and Applications, 2021, 57 (11): 119- 127. URL
16	ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 6077-6086.
17	TIAN Y H, CHEN G M, SONG Y, et al. Dependency-driven relation extraction with attentive graph convolutional networks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2021: 4458-4471.
18	SARZYNSKA-WAWER J, WAWER A, PAWLAK A, et al. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Research, 2021, 304, 114135. doi: 10.1016/j.psychres.2021.114135
19	LU D, NEVES L, CARVALHO V, et al. Visual attention model for Name tagging in multimodal social media[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2018: 1990-1999.
20	LI J, SUN A, HAN J, et al. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 2020, 34 (1): 50- 70.
21	YU J F, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 3342-3352.
22	ZHANG D, WEI S Z, LI S S, et al. Multi-modal graph fusion for named entity recognition with targeted visual guidance[C]//Proceedings of AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2021: 14347-14355.
23	ZHANG Q, FU J, LIU X, et al. Adaptive co-attention network for named entity recognition in Tweets[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New York, USA: ACM Press, 2018: 5674-5681.
24	LI L H, YATSKAR M, YIN D, et al. VisualBERT: a simple and performant baseline for vision and language[EB/OL]. [2023-04-11]. https://arxiv.org/abs/1908.03557.
25	LU J S, BATRA D, PARIKH D, et al. ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2019: 13-23.

[1]	LI Zhengxue, LI Zhiming, PENG Dezhong, CHEN Jie. User Classification of Social Networks Based on Feature Contrastive Learning and Graph Convolution [J]. Computer Engineering, 2024, 50(4): 258-266.
[2]	LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia. Text-Relation-Extraction Algorithm Based on Large-Language Model and Semantic Enhancement [J]. Computer Engineering, 2024, 50(4): 87-94.
[3]	Haoxin LIU, Chao DONG, Zhinan GOU, Kai GAO. Few-Shot Relation Extraction Method Fusing with Hybrid Representation [J]. Computer Engineering, 2023, 49(8): 63-68.
[4]	Jianhong MA, Tian GONG, Shuang YAO. Document-Level Relation Extraction Based on Evidential Sentences and Graph Convolutional Network [J]. Computer Engineering, 2023, 49(8): 104-110.
[5]	LIAO Tao, SUN Haojie, ZHANG Shunxiang. Entity-Relation Joint Extraction Model Based on Span and Feature Fusion [J]. Computer Engineering, 2023, 49(6): 107-114.
[6]	WU Xueying, DUAN Youxiang, CHANG Lunjie, LI Shiyin, SUN Qifeng. Research on Entity and Relation Joint Extraction for Geological Domain [J]. Computer Engineering, 2023, 49(3): 121-127.
[7]	YANG Zhenyu, WANG Lei, MA Bo, YANG Yating, DONG Rui, Azmat Anwar, WANG Zhen. A Cross-Lingual Distant Supervision Method for Uyghur and Chinese [J]. Computer Engineering, 2023, 49(2): 271-278.
[8]	Yuyan WANG, Jiapeng ZHAO, Jinqiao SHI, Liyan SHEN, Hongmeng LIU, Yanyan YANG. User Identity Information Aggregation Method for Darknet Web Page [J]. Computer Engineering, 2023, 49(11): 187-194, 210.
[9]	XIE Bailin, LI Qi, WEI Na, KUANG Jian. Personality Trait Identification Method Based on User Behavior on Social Network [J]. Computer Engineering, 2023, 49(1): 279-286,294.
[10]	ZHU Liming, DING Xiaobo, GONG Guoqiang. Privacy Protection Method in Continuous Publishing of Graph Data [J]. Computer Engineering, 2022, 48(5): 154-161.
[11]	CHEN Renjie, ZHENG Xiaoying, ZHU Yongxin. Joint Entity and Relation Extraction Fusing Entity Type Information [J]. Computer Engineering, 2022, 48(3): 46-53.
[12]	GUO Fengqi, MENG Fanrong, WANG Zhixiao. Rumor Stance Classification Algorithm Based on Variational Auto-Encoder [J]. Computer Engineering, 2022, 48(2): 99-105.
[13]	SUN Feixiang, CHEN Weidong, LIN Tiansen. Improved Voting Algorithm Based on K-truss for Influence Maximization Problem [J]. Computer Engineering, 2022, 48(11): 291-298.
[14]	JIANG Xu, QIAN Xuezhong, SONG Wei. Distantly Supervised Relationship Extraction Combined with Residual BiLSTM and Sentence Bag Attention [J]. Computer Engineering, 2022, 48(10): 110-115,122.
[15]	SHU Chong, OUYANG Zhi, DU Nisuo, HE Qing, WEI Qin. Research on Multi-Hop Reading Comprehension Based on Graph Neural Network with Improved Graph Nodes [J]. Computer Engineering, 2022, 48(1): 99-104.

Please choose a citation manager

Content to export