作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于知识图谱增强的领域多模态实体识别

  • 发布日期:2023-12-05

Enhanced Domain Multi-modal Entity Recognition based on Knowledge Graph

  • Published:2023-12-05

摘要: 针对特定领域中文命名实体识别存在句子简短或歧义等问题,提出了一种利用学科图谱和图像提高实体识别准确率的模型,旨在利用领域图谱和图像,提高计算机学科领域短文本中实体识别的准确率。使用基于BERT-BiLSTM的模型提取文本特征,使用ResNet152提取图像特征,并使用分词工具获得句子中的名词实体。通过BERT将名词实体与图谱节点进行特征嵌入,使用余弦相似度查找句子中的分词在学科图谱中最相似的节点,保留此节点领域为1的邻居节点,生成最佳匹配子图,作为句子的语义补充。模型使用多层感知机将文本、图像和子图三种特征映射到同一空间,并通过独特的门结构实现文本和图像的细粒度跨模态特征融合。最后通过交叉注意力机制将多模态特征与子图特征进行融合,输入解码器进行实体标记。在Twitter2015、Twitter2017和自建计算机学科数据集上同基线模型进行了比较,所提方法在领域数据集上的精确率、召回率和F1值分别可达88.56%、87.47%和88.01%,表明利用领域知识图谱能提升实体识别效果。

Abstract: In response to challenges encountered in Chinese Named Entity Recognition (NER) within specific domains, such as short sentences and ambiguity, a model is proposed to enhance entity recognition accuracy by utilizing domain-specific knowledge graphs and images. The aim is to leverage domain graphs and images to improve entity recognition in short texts related to computer science. The model employs a BERT-BiLSTM-based architecture to extract textual features, a ResNet152-based approach to extract image features, and a word segmentation tool to obtain noun entities from sentences. These noun entities are then embedded with knowledge graph nodes using BERT. The model uses cosine similarity to find the most similar nodes in the knowledge graph for the segmented words in the sentence. It retains neighboring nodes with domain label 1 to generate an optimal matching subgraph for semantic enrichment of the sentence. A multi-layer perceptron (MLP) is employed to map the textual, image, and subgraph features into the same space. A unique gating mechanism is utilized to achieve fine-grained cross-modal feature fusion between textual and image features. Finally, a cross-attention mechanism is applied to fuse multimodal features with subgraph features, which are then fed into the decoder for entity labeling. Experimental comparisons with relevant baseline models were conducted on Twitter 2015, Twitter 2017, and a self-constructed computer science dataset. Results indicate that the proposed approach achieves precision, recall, and F1-score of 88.56%, 87.47%, and 88.01% on the domain dataset, demonstrating the effectiveness of incorporating domain knowledge graphs for entity recognition.