基于Prompt打分的实体链接方法

doi:10.19678/j.issn.1000-3428.0068442

计算机工程 ›› 2025, Vol. 51 ›› Issue (3): 334-341. doi: 10.19678/j.issn.1000-3428.0068442

基于Prompt打分的实体链接方法

郭俊辰¹^,²^,*(), 马御棠², 相艳¹^,³, 赵学东¹, 郭军军¹^,³

1. 昆明理工大学信息工程与自动化学院, 云南昆明 650500
2. 云南电网有限责任公司电力科学研究院, 云南昆明 650217
3. 昆明理工大学云南省人工智能重点实验室, 云南昆明 650500

收稿日期:2023-09-22 出版日期:2025-03-15 发布日期:2024-05-09
通讯作者: 郭俊辰
基金资助:
云南省重大科技专项计划项目(202202AD080004); 云南省重大科技专项计划项目(202202AE090008); 国家自然科学基(62266025)

Entity Linking Method Based on Prompt Scoring

GUO Junchen¹^,²^,*(), MA Yutang², XIANG Yan¹^,³, ZHAO Xuedong¹, GUO Junjun¹^,³

1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
2. Electric Power Research Institute of Yunnan Power Grid Co., Ltd., Kunming 650217, Yunnan, China
3. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, Yunnan, China

Received:2023-09-22 Online:2025-03-15 Published:2024-05-09
Contact: GUO Junchen

摘要/Abstract

摘要：

实体链接旨在将自然语言文本中的提及链接到知识库中相应的目标实体, 主要面临提及和候选实体的表征能力有限, 导致候选实体精确排序困难的问题, 而现有的知识库扩展和图嵌入等提高表征能力的方法依赖外部数据或知识, 限制了其应用。提出一种实体链接中提及和候选实体精确排序的方法, 通过结合提及上下文构建prompt问句, 将提及和候选实体相似度计算转化为基于prompt问句的打分模式。通过预训练模型微调打分器, 得到提及和候选实体相似度的打分, 并综合候选实体发现阶段的得分, 以筛选出更准确的目标实体。这一过程无需额外的知识, 能够融合上下文信息, 从而更准确地衡量提及和实体之间的相似度。在两个公共数据集上将该模型与基线模型进行实验比较, 结果表明, 相比次优模型, 该模型Acc@1值分别提升了0.88和0.41百分点。

关键词: 实体链接, prompt问句, 预训练模型, 实体消歧, 精确排序

Abstract:

Entity Linking(EL) aims to link mentions in natural language texts to corresponding target entities in the knowledge base. It mainly faces the problem of limited representation capabilities of mentions and candidate entities, which complicates the accurate ranking of candidate entities. Existing knowledge is based on expand methods, such as graph embedding, to improve representation capabilities by relying on external data or knowledge, which limits their applications. This study proposes a method for accurately sorting mentions and candidate entities in entity links, thereby constructing a prompt question by considering the mention context. The similarity calculation of mentions and candidate entities is converted into a scoring model based on the prompt question. The score is fine-tuned using the pretrained model, to obtain a similarity score between mentions and candidate entities. The scores in the candidate entity discovery phase are combined to filter out more accurate target entities. This process requires no additional knowledge and incorporates contextual information to accurately measure the similarity between mentions and entities. An experimental comparison has been conducted between the proposed and baseline models on two public datasets. The Acc@1 values of the proposed model has increased by 0.88 and 0.41 percentage points, respectively, with respect to those of the suboptimal model.

Key words: Entity Linking(EL), prompt question, pretrained model, entity disambiguation, precise rank

郭俊辰, 马御棠, 相艳, 赵学东, 郭军军. 基于Prompt打分的实体链接方法[J]. 计算机工程, 2025, 51(3): 334-341.

GUO Junchen, MA Yutang, XIANG Yan, ZHAO Xuedong, GUO Junjun. Entity Linking Method Based on Prompt Scoring[J]. Computer Engineering, 2025, 51(3): 334-341.

收藏文章 0 / 推荐 / 导出引用

链接本文: https://www.ecice06.com/CN/10.19678/j.issn.1000-3428.0068442

https://www.ecice06.com/CN/Y2025/V51/I3/334

图/表 8

图1 基于密集向量表征相似度计算的实体链接

Fig.1 Entity linking based on dense vector representation similarity calculation

图2 本文提出模型

Fig.2 The model presented in this paper

图3 权重α对模型性能的影响

Fig.3 Influence of weight α on model performance

参考文献 28

1	MIHALCEA R, CSOMAI A. Wikify! linking documents to encyclopedic knowledge[C]//Proceedings of the 16th ACM Conference on Information and Knowledge Management, New York, USA: ACM Press, 2007: 233-242.
2	GANGEMI A , ALAM M , ASPRINO L , et al. Framester: a wide coverage linguistic linked data hub. Berlin, Germany: Springer, 2016.
3	PERSHINA M, HE Y F, GRISHMAN R. Personalized page rank for named entity disambiguation[C]//Proceedings of 2015 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2015: 238-243.
4	SPITKOVSKY V I, CHANG A X. A cross-lingual dictionary for English wikipedia concepts[C]//Proceedings of the 8th International Conference on Language Resources and Evaluation. Washington D. C., USA: IEEE Press, 2012: 3168-3175.
5	KARADENIZİ , ÖZGVR A . Linking entities through an ontology using word embeddings and syntactic re-ranking. BMC Bioinformatics, 2019, 20 (1): 156. doi: 10.1186/s12859-019-2678-8
6	JOSHI M, LEVY O, WELD D S, et al. BERT for coreference resolution: baselines and analysis[EB/OL]. [2023-06-20]. https://arxiv.org/abs/1908.09091.
7	BUNESCU R C. Learning for information extraction: from named entity recognition and disambiguation to relation extraction[D]. Austin, USA: The University of Texas at Austin, 2007.
8	ZHANG Y N , JIN L , ZHANG Z Q , et al. SF-ANN: leveraging structural features with an attention neural network for candidate fact ranking. Applied Intelligence, 2022, 52 (5): 5841- 5856. doi: 10.1007/s10489-021-02739-y
9	MA N Y , LIU X , GAO Y L . Entity linking based on graph model and semantic representation. Berlin, Germany: Springer, 2019.
10	TANG X L , YANG J M , XIONG D Y , et al. Knowledge-enhanced graph convolutional network for recommendation. Multimedia Tools and Applications, 2022, 81 (20): 28899- 28916. doi: 10.1007/s11042-022-12272-w
11	ZWICKLBAUER S, SEIFERT C, GRANITZER M. Robust and collective entity disambiguation through semantic embeddings[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM Press, 2016: 425-434.
12	SUNG M, JEON H, LEE J, et al. Biomedical entity representations with synonym marginalization[EB/OL]. [2023-06-20]. https://arxiv.org/abs/2005.00239.
13	HOFFART J, YOSEF M A, BORDINO I, et al. Robust disambiguation of named entities in text[C]//Proceedings of 2011 Conference on Empirical Methods in Natural Language Processing. Washington D. C., USA: IEEE Press, 2011: 782-792.
14	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-06-20]. https://arxiv.org/abs/1810.04805.
15	LAI T, JI H, ZHAI C X. BERT might be Overkill: a tiny but effective biomedical entity linker based on residual convolutional neural networks[EB/OL]. [2023-06-20]. https://arxiv.org/abs/2109.02237.
16	SUN K, ZHANG R C, MENSAH S, et al. A transformational biencoder with in-domain negative sampling for zero-shot entity linking[C]//Proceedings of the Findings of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2022: 1449-1458.
17	VRETINARIS A, LEI C, EFTHYMIOU V, et al. Medical entity disambiguation using graph neural networks[C]//Proceedings of 2021 International Conference on Management of Data. New York, USA: ACM Press, 2021: 2310-2318.
18	FANG Z, CAO Y N, LI R, et al. High quality candidate generation and sequential graph attention network for entity linking[C]//Proceedings of Web Conference. New York, USA: ACM Press, 2020: 640-650.
19	LI B Z, MIN S, IYER S, et al. Efficient one-pass end-to-end entity linking for questions[EB/OL]. [2023-06-20]. https://arxiv.org/abs/2010.02413.
20	D'SOUZA J, NG V. Sieve-based entity linking for the biomedical domain[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2015: 297-302.
21	SOHN S , COMEAU D C , KIM W , et al. Abbreviation definition identification based on automatic precision estimates. BMC Bioinformatics, 2008, 9, 402. doi: 10.1186/1471-2105-9-402
22	LEAMAN R , ISLAMAJ DOǦAN R , LU Z Y . DNorm: disease name normalization with pairwise learning to rank. Bioinformatics, 2013, 29 (22): 2909- 2917. doi: 10.1093/bioinformatics/btt474
23	LEAMAN R , LU Z Y . TaggerOne: joint named entity recognition and normalization with semi-Markov Models. Bioinformatics, 2016, 32 (18): 2839- 2846. doi: 10.1093/bioinformatics/btw343
24	LI H D , CHEN Q C , TANG B Z , et al. CNN-based ranking for biomedical entity normalization. BMC Bioinformatics, 2017, 18 (11): 385. URL
25	WRIGHT D. NormCo: deep disease normalization for biomedical knowledge base construction[D]. San Diego, USA: University of California, 2019.
26	PHAN M C, SUN A X, TAY Y. Robust representation learning of biomedical names[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2019: 3275-3285.
27	JI Z, WEI Q, XU H. BERT-based ranking for biomedical entity normalization[EB/OL]. [2023-06-20]. https://arxiv.org/pdf/1908.03548v1.
28	CHEN L H , VAROQUAUX G , SUCHANEK F M . A lightweight neural model for biomedical entity linking. Artificial Intelligence, 2021, 35 (14): 12657- 12665. URL

编辑推荐 0

Metrics

阅读次数

全文

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	6	5	0	42

来源	本网站	其他网站

次数	31	22
比例	58%	42%

摘要

最新录用	在线预览	正式出版

23	0	67

来源	本网站	其他网站

次数	34	57
比例	37%	63%

[1]	朱红, 王阔然, 朱彤. 基于多侧面信息表征联合的实体相似性度量及对齐方法[J]. 计算机工程, 2025, 51(3): 64-75.
[2]	王庆丰, 李旭, 姚春龙, 程腾腾. 面向研究生招生咨询的中文Text-to-SQL模型[J]. 计算机工程, 2025, 51(3): 362-368.
[3]	饶东宁, 许正辉, 梁瑞仕. 基于知识库问答的回答生成研究[J]. 计算机工程, 2025, 51(2): 94-101.
[4]	姚利峰, 蔡满春, 朱懿, 陈咏豪, 张溢文. 基于字节编码与预训练任务的加密流量分类模型[J]. 计算机工程, 2025, 51(2): 188-201.
[5]	费涛, 艾山·吾买尔, 杜文旭, 朱翠翠. 基于Squeezeformer的多颗粒度多方面发音质量评测方法[J]. 计算机工程, 2025, 51(1): 81-87.
[6]	魏嵬, 丁香香, 郭梦星, 杨钊, 刘辉. 文本相似度计算方法综述[J]. 计算机工程, 2024, 50(9): 18-32.
[7]	周昭辰, 方清茂, 吴晓红, 胡平, 何小海. 基于MacBERT与对抗训练的机器阅读理解模型[J]. 计算机工程, 2024, 50(5): 41-50.
[8]	李田芳, 普园媛, 赵征鹏, 徐丹, 钱文华. 基于CLIP和双空间自适应归一化的图像翻译[J]. 计算机工程, 2024, 50(5): 229-240.
[9]	侯钰涛, 阿布都克力木·阿布力孜, 史亚庆, 马依拉木·木斯得克, 哈里旦木·阿布都克里木. 面向"一带一路"的低资源语言机器翻译研究[J]. 计算机工程, 2024, 50(4): 332-341.
[10]	于明诚, 党亚固, 吴奇林, 吉旭, 毕可鑫. 基于多尺度上下文的英文作文自动评分研究[J]. 计算机工程, 2024, 50(3): 259-266.
[11]	张文博, 黄浩, 吴迪, 唐敏杰. 基于MEGA网络和分层预测的标点恢复方法[J]. 计算机工程, 2024, 50(12): 396-406.
[12]	孙仁科, 许靖昊, 皇甫志宇, 李仲年, 许新征. 基于视觉-语言预训练模型的零样本迁移学习方法综述[J]. 计算机工程, 2024, 50(10): 1-15.
[13]	曹发鑫, 孙媛媛, 王治政, 潘丁豪, 林鸿飞. 面向借贷案件的相似案例匹配模型[J]. 计算机工程, 2024, 50(1): 306-312.
[14]	张博旭, 蒲智, 程曦. 基于提示学习的维吾尔语文本分类研究[J]. 计算机工程, 2023, 49(6): 292-299,313.
[15]	朱红, 牛浩然, 朱彤. 基于字词融合与对抗训练的行业人物实体识别[J]. 计算机工程, 2023, 49(5): 56-62.

选择文件类型/文献管理软件名称

选择包含的内容

基于Prompt打分的实体链接方法

Entity Linking Method Based on Prompt Scoring

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 28

相关文章 15

编辑推荐 0

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于Prompt打分的实体链接方法

Entity Linking Method Based on Prompt Scoring

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 28

相关文章 15

编辑推荐 0

Metrics

本文评价