作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 77-84. doi: 10.19678/j.issn.1000-3428.0064545

• 人工智能与模式识别 • 上一篇    下一篇

语义与句法信息加强的二元标记实体关系联合抽取

衡红军, 苗菁   

  1. 中国民航大学 计算机科学与技术学院, 天津 300300
  • 收稿日期:2022-04-24 修回日期:2022-06-01 发布日期:2022-08-09
  • 作者简介:衡红军(1968-),男,副教授、博士,主研方向为智能信息处理、自然语言处理、知识图谱;苗菁,硕士研究生。
  • 基金资助:
    国家自然科学基金联合基金(U1333109)。

Joint Extraction of Binary Tagging Entity Relation for Enhanced Semantic and Syntactic Information

HENG Hongjun, MIAO Jing   

  1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Received:2022-04-24 Revised:2022-06-01 Published:2022-08-09

摘要: 随着互联网技术不断地发展,数据信息呈爆炸性增长,迫切需要从海量数据中高效地提取关键信息,而实体关系抽取作为信息抽取的核心任务,发挥着不可替代的重要作用。现有基于深度学习的实体关系抽取方法存在误差累积、实体冗余、交互缺失、实体关系重叠等问题。为充分利用语句的语义信息和句法信息,提出一种加强语义信息与句法信息的二元标记实体关系联合抽取模型SSERel。通过对输入文本进行BERT编码,并对三元组主体的开始位置和结束位置进行预测标记,提取文本的全局语义特征、主体与每个词语的局部语义特征以及句法特征,并将其融合进编码向量。对语句每种关系的客体位置进行预测标记,最终完成三元组的提取。在NYT和WebNLG数据集上的实验结果表明,相比CasRel模型,该模型的F1值分别提升2.7和1.4个百分点,能够有效解决复杂数据中存在的重叠三元组和多三元组等问题。

关键词: 信息抽取, 实体关系联合抽取, 语义信息, 句法依存分析, 图卷积神经网络

Abstract: With the continuous development of Internet technology, the amount of data and information is growing explosively.Therefore, the efficient extraction of key information from massive data is an urgent requirement.As the core task of information extraction, entity relation extraction plays an important and irreplaceable role.However, the existing entity relation extraction methods based on deep learning have limitations such as error accumulation, entity redundancy, lack of interaction, and entity relation overlap.To fully use the semantic and syntactic information of the sentence, a binary marked entity relation joint extraction model, SSERel, which enhance the semantic and syntactic information, is proposed. The global semantic features of a text, local semantic features of the subject and each word, and syntactic features are extracted and fused into the coding vector by BERT coding the input text, and predicting and marking the start and end positions of the triplet subject.The object position of each relation of the statement is predicted and marked to complete the extraction of the final triple.The experimental results using the NYT and WebNLG datasets indicate that compared with the CasRel model, the F1 value of the SSERel model increases by 2.7 and 1.4 percentage points. Additionally, the SSERel model performs well on complex data with overlapping and multiple triples.

Key words: information extraction, joint extraction of entity relation, semantic information, syntactic dependency analysis, Graph Convolution Neural Network(GCNN)

中图分类号: