作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于对比学习和重排序的实体链接方法研究

  • 发布日期:2025-04-09

Research on Entity Linking Method Based on Contrastive Learning and Re-ranking

  • Published:2025-04-09

摘要: 实体链接是一项将自然文本中的实体提及链接到知识库中相应实体的任务,在信息检索和问答系统等领域发挥着重要的作用。实体链接的挑战在于如何利用提及的上下文和知识库中实体的特征信息,生成候选实体并选择其中正确的实体。尽管一些方法依靠某种策略生成了相关的候选实体,并结合特征信息选择了合适的实体。但是这些方法往往未能学习更深层次的语义信息,导致不能得到高质量的候选实体,甚至正确的实体不包含在候选实体中。同时,在一些特定领域,实体信息资源会出现不充分的情况,使得一些方法缺乏在多个层面上进行交互的能力。为了解决以上问题,本文采用了一个两阶段的实体链接方法,首先生成了高质量的候选实体,随后聚合实体特征信息实现粗粒度和细粒度层面的重排序。具体来说,本文使用基于混合负样本采样的对比学习方法检索得到高质量的候选实体。随后,本文利用弱监督的方式预测实体细粒度的类型,并使用粗粒度和细粒度的类型信息对候选实体进行重排序。最终,本文在三个公共数据集上证明了本方法可以有效提升实体链接的效果。

Abstract: Entity Linking (EL) is the task of linking entity mentions in texts with the corresponding entities in a knowledge base. It plays a crucial role in information retrieval and question answering system. The challenge in entity linking lies in leveraging the context of mentions and the feature information of entities in the knowledge base to generate candidate entities and select the correct one. Although some approaches rely on certain strategies to generate relevant candidate entities and use feature information to select the appropriate entity . But , these approaches fail to learn deeper semantic information, resulting in low-quality candidate entities and may exclude some gold entities . Additionally , in certain specialized domains , the lack of sufficient entity information resources makes it difficult for some methods to interact on multiple levels . To address these issues, this paper adopts a two-stage EL method that initially generates high-quality candidate entities and subsequently integrates entity feature information for re-ranking. Specifically, this method employ a contrastive learning based on mixed negative sampling approach for retrieving high-quality candidates. Then, this method predicts the fine-grained entity type through weakly supervised learning, and re-ranks of candidates based on the coarse and fine-grained entity types. In the end, extensive experiments on three public datasets confirm that the method could improve the EL performance.