作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (6): 107-114. doi: 10.19678/j.issn.1000-3428.0065261

• 人工智能与模式识别 • 上一篇    下一篇

基于跨度和特征融合的实体关系联合抽取模型

廖涛, 孙皓洁, 张顺香   

  1. 安徽理工大学 计算机科学与工程学院, 安徽 淮南 232001
  • 收稿日期:2022-07-15 修回日期:2022-08-05 发布日期:2022-09-30
  • 作者简介:廖涛(1977-),男,副教授、博士,主研方向为Web数据挖掘;孙皓洁,硕士研究生;张顺香,教授、博士。
  • 基金资助:
    国家自然科学基金面上项目(62076006);安徽省高校协同创新项目(GXXT-2021-008);安徽省自然科学基金面上项目(1908085MF189)。

Entity-Relation Joint Extraction Model Based on Span and Feature Fusion

LIAO Tao, SUN Haojie, ZHANG Shunxiang   

  1. School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, Anhui, China
  • Received:2022-07-15 Revised:2022-08-05 Published:2022-09-30

摘要: 实体关系联合抽取模型在实体关系抽取中具有重要作用,针对现有的实体关系联合抽取模型无法有效识别重叠关系中的实体关系三元组问题,提出一种新型的基于跨度和特征融合的实体关系联合抽取模型SFFM。将文本输入BERT预训练模型转变为词向量,根据跨度进行词向量划分形成跨度序列,并基于卷积神经网络过滤跨度序列中不包含实体的跨度序列,使用双向长短时记忆提取剩余跨度序列融合文本信息后的特征并通过Softmax回归实现实体识别,将文本中的实体和关系映射到不同的跨度序列中,当重叠关系中的实体和距离较远的实体之间存在关系时,按照跨度进行划分使可能存在关系的实体对划分到同一个跨度序列中,以更好地利用文本中的重叠关系。在此基础上,通过注意力机制获取跨度序列中的依赖关系,运用Softmax回归对跨度序列中的关系进行分类。实验结果表明,与基线模型相比,该模型在CoNLL04数据集上的微平均和宏平均分别提升了1.87和1.73个百分点,在SciERC数据集上的微平均提升了5.95个百分点。

关键词: 联合抽取, 实体关系抽取, 神经网络, 跨度, 特征融合

Abstract: The entity-relationship joint extraction model plays an important role in entity-relationship extraction; however,the existing entity-relationship joint extraction model cannot effectively identify entity-relation triples in overlapping relationships.This paper proposes a novel entity-relationship extraction model SFFM based on span and feature fusion. The model first converts the text input to the BERT pre-training model into word vectors. Then,it divides the word vectors based on the span to form a span sequence,filters the span sequences that do not contain entities based on Convolutional Neural Network(CNN),and uses Bi-directional Long Short-Term Memory(Bi-LSTM) to extract the features of the remaining span sequences.It uses Softmax regression to perform entity recognition. The span sequence formed by the division can map the entities and relationships in the text to different span sequences.When there is a relationship between an entity in an overlapping relationship and an entity with a long distance,the division is performed according to the span so that the entities that may have a relationship are paired with each other. Dividing into the same span sequence can effectively utilize the overlapping relationship proposed in this paper.Finally,the attention mechanism is used to obtain the dependencies in the span sequence,and Softmax regression is used to classify the relationships in the span sequence.The experimental results show that compared with the baseline model,the micro-average and macro-average of the CoNLL04 dataset increase by 1.87 and 1.73 percentage points, respectively,and the micro-average increases by 5.95 percentage points in the SciERC dataset.

Key words: joint extraction, entity-relation extraction, neural network, span, feature fusion

中图分类号: