作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (8): 274-282,291. doi: 10.19678/j.issn.1000-3428.0061596

• 开发研究与工程应用 • 上一篇    下一篇

融入事件实体知识的汉越跨语言新闻事件检索

薛振宇1,2, 余正涛1,2, 高盛祥1,2   

  1. 1. 昆明理工大学 信息工程与自动化学院, 昆明 650500;
    2. 昆明理工大学 云南省人工智能重点实验室, 昆明 650500
  • 收稿日期:2021-05-10 修回日期:2021-09-05 发布日期:2022-08-09
  • 作者简介:薛振宇(1996-),男,硕士研究生,主研方向为自然语言处理、跨语言信息检索;余正涛,教授、博士;高盛祥(通信作者),副教授、博士。
  • 基金资助:
    国家自然科学基金(61972186,61762056,61472168);国家重点研发计划(2018YFC0830105,2018YFC0830101,2018YFC0830100);云南省重大科技专项(202002AD080001);云南省高科技人才项目(201606,202105AC160018);云南省基础研究计划(202001AS070014,2018FB104)。

Chinese-Vietnamese Cross-Language News Event Retrieval Incorporating Event Entity Knowledge

XUE Zhenyu1,2, YU Zhengtao1,2, GAO Shengxiang1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China
  • Received:2021-05-10 Revised:2021-09-05 Published:2022-08-09

摘要: 现有汉越跨语言新闻事件检索方法较少使用新闻领域内的事件实体知识,在候选文档中存在多个事件的情况下,与查询句无关的事件会干扰查询句与候选文档间的匹配精度,影响检索性能。提出一种融入事件实体知识的汉越跨语言新闻事件检索模型。通过查询翻译方法将汉语事件查询句翻译为越南语事件查询句,把跨语言新闻事件检索问题转化为单语新闻事件检索问题。考虑到查询句中只有单个事件,候选文档中多个事件共存会影响查询句和文档的精准匹配,利用事件触发词划分候选文档事件范围,减小文档中与查询无关事件的干扰。在此基础上,利用知识图谱和事件触发词得到事件实体丰富的知识表示,通过查询句与文档事件范围间的交互,提取到事件实体知识表示与词以及事件实体知识表示之间的排序特征。在汉越双语新闻数据集上的实验结果表明,与BM25、Conv-KNRM、ATER等基线模型相比,该模型能够取得较好的跨语言新闻事件检索效果,NDCG和MAP指标最高可提升0.712 2和0.587 2。

关键词: 跨语言检索, 事件实体, 事件触发词, 事件范围, 排序学习, 事件检索

Abstract: The existing Chinese-Vietnamese cross-language news event retrieval methods are not sufficiently integrated into the knowledge of event entities in the news field.Furthermore, when there are multiple events in the candidate document, events unrelated to the query sentence interfere with the matching accuracy between the query sentence and the candidate documents, which affects retrieval performance.This study proposes a Chinese-Vietnamese cross-language news event retrieval model incorporating event entity knowledge.The query translation method is used to translate Chinese event query sentences into Vietnamese event query sentences, and the cross-language news event retrieval problem is transformed into a monolingual news event retrieval problem.Considering that there is only a single event in the query sentence, the coexistence of multiple events in the candidate document affects the exact match between the query sentence and the document.The event trigger word is used to divide the event range of the candidate document and to reduce the interference of events unrelated to the query in the document.On this basis, the knowledge graph and event trigger words are used to obtain the rich knowledge representation of event entities.Through the interaction between the query sentence and the document event scope, the ranking features between the knowledge representation of event entities and the knowledge representation of words and event entities are extracted.The experimental results on the Chinese-Vietnamese bilingual news dataset show that compared with baseline models such as BM25, Conv-KNRM, and ATER, the proposed model achieves better cross-language news event retrieval performance;furthermore, using the proposed model, the NDCG and MAP indicators can be improved by up to 0.712 2 and 0.587 2.

Key words: cross-language retrieval, event entity, event trigger, event range, ranking learning, event retrieval

中图分类号: