基于知识图谱增强的领域多模态实体识别

doi:10.19678/j.issn.1000-3428.0068225

摘要/Abstract

摘要：

针对特定领域中文命名实体识别存在的局限性, 提出一种利用学科图谱和图像提高实体识别准确率的模型, 旨在利用领域图谱和图像提高计算机学科领域短文本中实体识别的准确率。使用基于BERT-BiLSTM-Attention的模型提取文本特征, 使用ResNet152提取图像特征, 并使用分词工具获得句子中的名词实体。通过BERT将名词实体与图谱节点进行特征嵌入, 利用余弦相似度查找句子中的分词在学科图谱中最相似的节点, 保留到该节点距离为1的邻居节点, 生成最佳匹配子图, 作为句子的语义补充。使用多层感知机(MLP)将文本、图像和子图3种特征映射到同一空间, 并通过独特的门控机制实现文本和图像的细粒度跨模态特征融合。最后, 通过交叉注意力机制将多模态特征与子图特征进行融合, 输入解码器进行实体标记。在Twitter2015、Twitter2017和自建计算机学科数据集上同基线模型进行实验比较, 结果显示, 所提方法在领域数据集上的精确率、召回率和F1值分别可达88.56%、87.47%和88.01%, 与最优基线模型相比, F1值提高了1.36个百分点, 表明利用领域知识图谱能有效提升实体识别效果。

关键词: 命名实体识别, 多模态, 领域, 知识图谱, 跨模态特征融合, 注意力机制

Abstract:

Addressing the limitations of Chinese Named Entity Recognition(NER) within specific domains, this paper proposes a model to enhance entity recognition accuracy by utilizing domain-specific Knowledge Graphs(KGs) and images. The proposed model leverages domain graphs and images to improve entity recognition accuracy in short texts related to computer science. The model employs a Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long Short-Term Memory(BiLSTM)-Attention-based model to extract textual features, a ResNet152-based approach to extract image features, and a word segmentation tool to obtain noun entities from sentences. These noun entities are then embedded with KG nodes using BERT. The model uses cosine similarity to determine the most similar nodes in the KG for the segmented words in the sentence. It retains neighboring nodes with a distance of 1 from this node to generate an optimal matching subgraph for semantic enrichment of the sentence. A Multi-Layer Perceptron(MLP) is employed to map the textual, image, and subgraph features into the same space. A unique gating mechanism is utilized to achieve fine-grained cross-modal feature fusion between textual and image features. Finally, multimodal features are fused with subgraph features by using a cross-attention mechanism and are then fed into the decoder for entity labeling. Experimental comparisons with relevant baseline models conducted on Twitter2015, Twitter2017, and a self-constructed computer science dataset are presented. The results indicate that the proposed approach achieved precision, recall, and F1 value of 88.56%, 87.47%, and 88.01% on the domain dataset compared to the optimal baseline model, its F1 value increased by 1.36 percentage points, demonstrating the effectiveness of incorporating domain KGs for entity recognition.

Key words: Named Entity Recognition(NER), multi-modal, domain, Knowledge Graph(KG), cross-modal feature fusion, attention mechanism

李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.

Huayu LI, Zhikang ZHANG, Yang YAN, Yang YUE. Enhanced Domain Multi-modal Entity Recognition Based on Knowledge Graph[J]. Computer Engineering, 2024, 50(8): 31-39.

https://www.ecice06.com/CN/Y2024/V50/I8/31

图/表 9

图1 KGMNER模型整体结构

Fig.1 Overall structure of KGMNER model

图2 文本-图像跨模态特征融合

Fig.2 Text-image cross-modal feature fusion

图3 消融模型测试示例1

Fig.3 Example 1 of ablation model testing

图4 消融模型测试示例2

Fig.4 Example 2 of ablation model testing

图5 匹配阈值对模型性能的影响

Fig.5 Effect of matching threshold on model performance

参考文献 28

1	潘梦竹, 李千目, 邱天. 深度多模态表示学习的研究综述. 计算机工程与应用, 2023, 59(2): 48- 64. URL
	PAN M Z, LI Q M, QIU T. Survey of research on deep multimodal representation learning. Computer Engineering and Applications, 2023, 59(2): 48- 64. URL
2	王颖洁, 张程烨, 白凤波, 等. 中文命名实体识别研究综述. 计算机科学与探索, 2023, 17(2): 324- 341. URL
	WANG Y J, ZHANG C Y, BAI F B, et al. Review of Chinese named entity recognition research. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 324- 341. URL
3	隋国华, 李陶然, 刘昊, 等. 基于图表示学习的领域知识图谱推理技术研究. 计算机工程, 2023, 49(9): 89- 98. URL
	SUI G H, LI T R, LIU H, et al. Research on domain knowledge graph inference technology based on graph representation learning. Computer Engineering, 2023, 49(9): 89- 98. URL
4	潘正高. 基于规则和统计相结合的中文命名实体识别研究. 情报科学, 2012, 30(5): 708-712, 786. URL
	PAN Z G. Research on the recognition of Chinese named entity based on rules and statistics. Information Science, 2012, 30(5): 708-712, 786. URL
5	闫萍. 基于规则和概率统计相结合的中文命名实体识别研究. 计算机与数字工程, 2011, 39(9): 88- 91. URL
	YAN P. Research on the identification for Chinese named entity based on combination of rules and statistic analysis. Computer & Digital Engineering, 2011, 39(9): 88- 91. URL
6	王欢, 朱文球, 吴岳忠, 等. 基于数控机床设备故障领域的命名实体识别. 工程科学学报, 2020, 42(4): 476- 482. URL
	WANG H, ZHU W Q, WU Y Z, et al. Named entity recognition based on equipment and fault field of CNC machine tools. Chinese Journal of Engineering, 2020, 42(4): 476- 482. URL
7	杨培, 杨志豪, 罗凌, 等. 基于注意机制的化学药物命名实体识别. 计算机研究与发展, 2018, 55(7): 1548- 1556. URL
	YANG P, YANG Z H, LUO L, et al. An attention-based approach for chemical compound and drug named entity recognition. Journal of Computer Research and Development, 2018, 55(7): 1548- 1556. URL
8	ZHAO Z H, YANG Z H, LUO L, et al. Disease named entity recognition from biomedical literature using a novel convolutional neural network. BMC Medical Genomics, 2017, 10(Suppl 5): 75- 83. doi: 10.1186/s12920-017-0316-8
9	王蓬辉, 李明正, 李思. 基于数据增强的中文医疗命名实体识别. 北京邮电大学学报, 2020, 43(5): 84- 90. URL
	WANG P H, LI M Z, LI S. Data augmentation for Chinese clinical named entity recognition. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 84- 90. URL
10	AGUILAR G, MAHARJAN S, LÓPEZ M A P, et al. A multi-task approach for named entity recognition in social media data[C]//Proceedings of the 3rd Workshop on Noisy User-generated Text. Stroudsburg, USA: Association for Computational Linguistics, 2017: 148-153.
11	MOON S, NEVES L, CARVALHO V. Multimodal named entity recognition for short social media posts[EB/OL]. [2023-07-01]. https://arxiv.org/pdf/1802.07862.
12	ZHENG C M, WU Z W, WANG T, et al. Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Transactions on Multimedia, 2021, 23, 2520- 2532. doi: 10.1109/TMM.2020.3013398
13	YU J F, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 3342-3352.
14	CHEN D W, LI Z X, GU B B, et al. Multimodal named entity recognition with image attributes and image knowledge[C]//Proceedings of the 26th International Conference on Database Systems for Advanced Applications. Berlin, Germany: Springer, 2021: 186-201.
15	ZHAO F, LI C H, WU Z, et al. Learning from different text-image Pairs: a relation-enhanced graph convolutional network for multimodal NER[C]//Proceedings of the 30th ACM International Conference on Multimedia. New York, USA: ACM Press, 2022: 3983-3992.
16	WANG J, YANG Y, LIU K Y, et al. M3S: scene graph driven multi-granularity multi-task learning for multi-modal NER. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2022, 31, 111- 120.
17	李家瑞, 李华昱, 闫阳. 面向多源异质数据源的学科知识图谱构建方法. 计算机系统应用, 2021, 30(10): 59- 67. URL
	LI J R, LI H Y, YAN Y. Construction of discipline knowledge graph for multi-source heterogeneous data sources. Computer Systems & Applications, 2021, 30(10): 59- 67. URL
18	SANG E F T K, VEENSTRA J. Representing text chunks[EB/OL]. [2023-07-01]. https://arxiv.org/pdf/cs/9907006.
19	DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2009: 248-255.
20	ZHOU B H, ZHANG Y, SONG K H, et al. A span-based multimodal variational autoencoder for semi-supervised multimodal named entity recognition[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2022: 6293-6302.
21	胡新棒, 于溆乔, 李邵梅, 等. 基于知识增强的中文命名实体识别. 计算机工程, 2021, 47(11): 84- 92. URL
	HU X B, YU X Q, LI S M, et al. Chinese named entity recognition based on knowledge enhancement. Computer Engineering, 2021, 47(11): 84- 92. URL
22	ZHANG D, WEI S Z, LI S S, et al. Multi-modal graph fusion for named entity recognition with targeted visual guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(16): 14347- 14355. doi: 10.1609/aaai.v35i16.17687
23	ZHANG Q, FU J L, LIU X Y, et al. Adaptive co-attention network for named entity recognition in Tweets. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5674- 5681.
24	LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2018: 1990-1999.
25	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-07-01]. https://arxiv.org/abs/1810.04805v2.
26	HU S L, ZHANG H J, HU X S, et al. Chinese named entity recognition based on BERT-CRF model[C]//Proceedings of the 22nd IEEE/ACIS International Conference on Computer and Information Science. Washington D. C., USA: IEEE Press, 2022: 105-108.
27	XU B, HUANG S Z, SHA C F, et al. MAF: a general matching and alignment framework for multimodal named entity recognition[C]//Proceedings of the 15th ACM International Conference on Web Search and Data Mining. New York, USA: ACM Press, 2022: 1215-1223.
28	JIA M, SHEN L, SHEN X, et al. MNER-QG: an end-to-end MRC framework for multimodal named entity recognition with query grounding. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(7): 8032- 8040.

[1]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[2]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[3]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[4]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[5]	党小超, 刘涧, 董晓辉, 祝忠彦, 李芬芳. 面向不平衡数据的机械设备故障命名实体识别[J]. 计算机工程, 2024, 50(9): 104-112.
[6]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[7]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[8]	汤志康, 武毓琦, 李春英, 汤庸. 基于知识图谱卷积网络的学习资源推荐[J]. 计算机工程, 2024, 50(9): 153-160.
[9]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[10]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[11]	张华青, 夏张涛, 陆晓庆, 童基均. 基于字形特征的血管外科命名实体识别[J]. 计算机工程, 2024, 50(8): 13-21.
[12]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[13]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[14]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[15]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.

选择文件类型/文献管理软件名称

选择包含的内容