作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (6): 68-75. doi: 10.19678/j.issn.1000-3428.0058189

• 人工智能与模式识别 • 上一篇    下一篇

结合实体描述信息的跨句包关系抽取方法

孙新1, 申长虹1, 姜景虎1, 崔家铭2   

  1. 1. 北京理工大学 计算机学院, 北京 100081;
    2. 复旦大学 信息科学与工程学院, 上海 200433
  • 收稿日期:2020-04-28 修回日期:2020-06-26 发布日期:2020-06-05
  • 作者简介:孙新(1975-),女,副教授、博士,主研方向为自然语言处理、机器学习、深度学习;申长虹、姜景虎,硕士研究生;崔家铭,本科生。

Cross-Sentence Bag Relation Extraction Method Combining Entity Description Information

SUN Xin1, SHEN Changhong1, JIANG Jinghu1, CUI Jiaming2   

  1. 1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;
    2. School of Information Science and Technology, Fudan University, Shanghai 200433, China
  • Received:2020-04-28 Revised:2020-06-26 Published:2020-06-05
  • Contact: 国家重点研发计划(2017YFB0803300)。 E-mail:sunxin@bit.edu.cn

摘要: 远程监督关系抽取方法能够大幅减少标注成本,但现有方法忽略了关系间的关联信息和实体背景知识。结合实体描述信息提出一种新的跨句包关系抽取方法。引入分段卷积神经网络进行句编码,解决特征提取的误差传播问题。同时设计跨关系跨句包注意力机制获取关系特征,更好地从远程监督的噪声数据中鉴别有效实例,从而充分利用关系之间丰富的相关信息并降低噪音句子的影响。在此基础上,利用卷积神经网络提取实体描述信息,补充关系抽取任务所需的背景知识,为跨关系跨句包注意力模块提供更好的实体表示。在NYT公共数据集上的实验结果表明,该方法在句子层面抽取任务上的F1值较结合句注意力与实体描述信息的分段卷积方法提高了4%左右,能够有效改善远程监督关系抽取效果。

关键词: 关系抽取, 实体描述, 跨关系注意力, 跨句包注意力, 远程监督

Abstract: Distant supervision can significantly reduce the cost of labeling, but the existing methods ignore the correlation information between relations and entity description information.To address the problem, this paper proposes a new cross-sentence bag relation extraction method combining entity description information.The Piecewise Convolutional Neural Network(PCNN) is introduced to encode sentences to alleviate the error propagation problem of feature extraction.At the same time, a cross-relation and cross-sentence bag attention mechanism is designed to obtain the relation features, so the valid instances can be identified more effectively from the noisy data of distant supervision. Thus the abundant relevant information between relations can be made full use of, and the influence of noisy sentences can be reduced.On this basis, CNN is used to extract entity description information, which supplements the background knowledge required by the relation extraction tasks and provides a better entity representation for the cross-relation and cross-sentence bag attention mechanism.The experimental results on NYT corpus show that the F1 score of the proposed method is improved by about 4% in sentence-level extraction tasks compared with the piecewise convolutional method combining sentence attention and entity description information.It can significantly improve the performance of relation extraction using distant supervision.

Key words: relation extraction, entity description, cross-relation attention, cross-sentence bag attention, distant supervision

中图分类号: