作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (8): 31-39. doi: 10.19678/j.issn.1000-3428.0068225

• 人工智能与模式识别 • 上一篇    下一篇

基于知识图谱增强的领域多模态实体识别

李华昱*(), 张智康, 闫阳, 岳阳   

  1. 中国石油大学(华东)计算机科学与技术学院, 山东 青岛 266580
  • 收稿日期:2023-08-16 出版日期:2024-08-15 发布日期:2024-08-29
  • 通讯作者: 李华昱
  • 基金资助:
    山东省自然科学基金面上项目(ZR2020MF140); 中国石油大学(华东)研究生创新基金(22CX04035A)

Enhanced Domain Multi-modal Entity Recognition Based on Knowledge Graph

Huayu LI*(), Zhikang ZHANG, Yang YAN, Yang YUE   

  1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao 266580, Shandong, China
  • Received:2023-08-16 Online:2024-08-15 Published:2024-08-29
  • Contact: Huayu LI

摘要:

针对特定领域中文命名实体识别存在的局限性, 提出一种利用学科图谱和图像提高实体识别准确率的模型, 旨在利用领域图谱和图像提高计算机学科领域短文本中实体识别的准确率。使用基于BERT-BiLSTM-Attention的模型提取文本特征, 使用ResNet152提取图像特征, 并使用分词工具获得句子中的名词实体。通过BERT将名词实体与图谱节点进行特征嵌入, 利用余弦相似度查找句子中的分词在学科图谱中最相似的节点, 保留到该节点距离为1的邻居节点, 生成最佳匹配子图, 作为句子的语义补充。使用多层感知机(MLP)将文本、图像和子图3种特征映射到同一空间, 并通过独特的门控机制实现文本和图像的细粒度跨模态特征融合。最后, 通过交叉注意力机制将多模态特征与子图特征进行融合, 输入解码器进行实体标记。在Twitter2015、Twitter2017和自建计算机学科数据集上同基线模型进行实验比较, 结果显示, 所提方法在领域数据集上的精确率、召回率和F1值分别可达88.56%、87.47%和88.01%, 与最优基线模型相比, F1值提高了1.36个百分点, 表明利用领域知识图谱能有效提升实体识别效果。

关键词: 命名实体识别, 多模态, 领域, 知识图谱, 跨模态特征融合, 注意力机制

Abstract:

Addressing the limitations of Chinese Named Entity Recognition(NER) within specific domains, this paper proposes a model to enhance entity recognition accuracy by utilizing domain-specific Knowledge Graphs(KGs) and images. The proposed model leverages domain graphs and images to improve entity recognition accuracy in short texts related to computer science. The model employs a Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long Short-Term Memory(BiLSTM)-Attention-based model to extract textual features, a ResNet152-based approach to extract image features, and a word segmentation tool to obtain noun entities from sentences. These noun entities are then embedded with KG nodes using BERT. The model uses cosine similarity to determine the most similar nodes in the KG for the segmented words in the sentence. It retains neighboring nodes with a distance of 1 from this node to generate an optimal matching subgraph for semantic enrichment of the sentence. A Multi-Layer Perceptron(MLP) is employed to map the textual, image, and subgraph features into the same space. A unique gating mechanism is utilized to achieve fine-grained cross-modal feature fusion between textual and image features. Finally, multimodal features are fused with subgraph features by using a cross-attention mechanism and are then fed into the decoder for entity labeling. Experimental comparisons with relevant baseline models conducted on Twitter2015, Twitter2017, and a self-constructed computer science dataset are presented. The results indicate that the proposed approach achieved precision, recall, and F1 value of 88.56%, 87.47%, and 88.01% on the domain dataset compared to the optimal baseline model, its F1 value increased by 1.36 percentage points, demonstrating the effectiveness of incorporating domain KGs for entity recognition.

Key words: Named Entity Recognition(NER), multi-modal, domain, Knowledge Graph(KG), cross-modal feature fusion, attention mechanism