作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (3): 46-53. doi: 10.19678/j.issn.1000-3428.0060745

• 人工智能与模式识别 • 上一篇    下一篇

融合实体类别信息的实体关系联合抽取

陈仁杰1,2, 郑小盈1,2, 祝永新1,2   

  1. 1. 中国科学院上海高等研究院, 上海 201210;
    2. 中国科学院大学, 北京 100049
  • 收稿日期:2021-01-29 修回日期:2021-03-12 发布日期:2021-03-15
  • 作者简介:陈仁杰(1995-),男,硕士研究生,主研方向为自然语言处理;郑小盈,副研究员、博士;祝永新(通信作者),研究员、博士。
  • 基金资助:
    国家重点研发计划(2019YFC0117302);国家自然科学基金(U2032125);上海市自然科学基金(19ZR1463900);中国科学院上海高等研究院院内人才计划(E052891ZZ1);上海高等研究院与上海光源合作项目(E0560W1ZZ0)。

Joint Entity and Relation Extraction Fusing Entity Type Information

CHEN Renjie1,2, ZHENG Xiaoying1,2, ZHU Yongxin1,2   

  1. 1. Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-01-29 Revised:2021-03-12 Published:2021-03-15

摘要: 针对实体关系抽取任务中的三元组重叠问题,基于编码器-解码器结构的联合抽取方法能够通过序列生成的方式加以解决。但现有方法没有充分利用实体类别信息,而实体类别信息对于构建更丰富的语义特征并进一步优化关系模型的效果具有重要意义。在使用编码器-解码器结构的基础上,融合实体类别信息构建实体关系联合抽取模型FETI。编码器采用经典Bi-LSTM结构,解码器采用树状解码替代传统的一维线性解码。同时,在解码阶段增加头尾实体类别的预测,并通过辅助损失函数进行约束,使模型能够更有效地利用实体类别信息。在百度公开的中文数据集DuIE上进行实验,结果表明,FETI的F1值达到0.758,相对于CopyMTL、WDec、MHS、Seq2UMTree模型提升了2.02%~9.86%,验证了融合实体类别信息对于提升实体关系抽取模型性能的有效性。此外,基于不同解码顺序和不同权重损失函数的实验结果表明,解码顺序对模型性能影响较大,而对主要任务的损失函数赋予较高权重,能够保证辅助任务为主要任务提供有效的背景知识,同时限制噪声的影响。

关键词: 实体关系抽取, 联合抽取, 实体类别信息, 三元组重叠, 编码器, 解码器

Abstract: For the problem of triplet overlapping in entity and relation extraction, a joint extraction method based on an encoder-decoder structure can be used to solve it using sequence generation.However, the existing methods do not fully use the entity type information, which is of great significance for building richer semantic features and for further improving the effect of the relationship model.Based on an encoder-decoder structure, the joint entity and relation extraction model FETI is constructed by fusing entity type information.The encoder adopts the classical Bidirectional Long-Short Term Memory(Bi-LSTM) structure, and the decoder adopts tree decoding to replace the traditional one-dimensional linear decoding.At the same time, the prediction of the head and tail entity types is added in the decoding stage and is constrained by the auxiliary loss function, so that the model can more effectively use the entity type information.The experiment is conducted on the Chinese dataset DuIE published by Baidu.The experimental results show that the F1 value of FETI reached 0.758, which is 2.02%~9.86% higher than those of the CopyMTL, WDec, MHS and Seq2UMTree models.The results confirm the effectiveness of integrating the entity category information to improve the performance of the entity and relation extraction model.In addition, the experimental results based on different decoding sequences and different weight loss functions show that, the decoding sequence has a great impact on the performance of the model, and that giving higher weight to the loss function of the main task, which ensures that the auxiliary task can provide effective background knowledge for the main task and limit the impact of noise.

Key words: entity and relation extraction, joint extraction, entity type information, triplet overlapping, encoder, decoder

中图分类号: