作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 143-149. doi: 10.19678/j.issn.1000-3428.0063662

• 人工智能与模式识别 • 上一篇    下一篇

一个实体关系与事件抽取的通用模型

杨红菊1,2, 靳新宇1   

  1. 1. 山西大学 计算机与信息技术学院, 太原 030006;
    2. 山西大学 计算智能与中文信息处理教育部重点实验室, 太原 030006
  • 收稿日期:2021-12-30 修回日期:2022-03-16 发布日期:2022-07-04
  • 作者简介:杨红菊(1975-),女,副教授,主研方向为中文信息处理、计算机视觉;靳新宇,硕士研究生。
  • 基金资助:
    国家自然科学基金(61976128);山西省高等学校科技创新计划项目(2019L0103);山西省1331工程项目。

A General Model for Entity Relationship and Event Extraction

YANG Hongju1,2, JIN Xinyu1   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Received:2021-12-30 Revised:2022-03-16 Published:2022-07-04

摘要: 信息提取的目的是从自然语言文件中找到具体信息,现有研究在信息抽取的实体关系和事件抽取任务中仅解决事件论元重叠和实体关系重叠的问题,未考虑两个任务共有的角色重叠问题,导致抽取结果准确率降低。提出一个两阶段的通用模型用于完成实体关系抽取和事件抽取子任务。基于预训练语言模型RoBERTa的共享特征表示,分别对实体关系/事件类型和实体关系/事件论元进行预测。将传统抽取触发词任务转化为多标签抽取事件类型任务,利用多尺度神经网络进一步提取文本特征。在此基础上,通过抽取文本相关类型的事件论元,根据论元角色的重要性对损失函数重新加权,解决数据不平衡、实体关系抽取和事件抽取中共同存在论元角色重叠的问题。在千言数据集中事件抽取和关系抽取任务测试集上的实验验证了该模型的有效性,结果表明,该模型的F1值分别为83.1%和75.3%。

关键词: 事件抽取, 实体关系抽取, 角色重叠, RoBERTa模型, 多标签分类

Abstract: The purpose of information extraction is to find specific information from natural language files.Existing research has only focused on solving the problem of event argument overlap and entity relationship overlap in the entity relationship and event extraction tasks of information extraction;it has not considered the problem of roles overlap shared by the two tasks, which leads to a reduction in the accuracy of extraction results.A general two-phase model is proposed to complete the sub-tasks of entity relationship extraction and event extraction.Based on the shared feature representation of the pre-training language model RoBERTa, the entity relationship/event type and entity relationship/event argument are predicted.The traditional task of extracting trigger words is transformed into a task of extracting event types from multi-label, and the text features are further extracted using multi-scale neural networks.On this basis, the loss function is reweighted according to the importance of argument roles by extracting event arguments of text-related types to solve the problem of data imbalance and overlapping of argument roles in entity relationship extraction and event extraction.Experiments on event extraction task testset and relation extraction task testset in Luge dataset verify the effectiveness of the proposed model.The experimental results show that the F1 values of the proposed model on these two test sets are 83.1% and 75.3%, respectively.

Key words: event extraction, entity relationship extraction, roles overlap, RoBERTa model, multi-label classification

中图分类号: