作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (10): 164-173. doi: 10.19678/j.issn.1000-3428.0068728

• 人工智能与模式识别 • 上一篇    下一篇

基于双粒度图的文档级关系抽取

廖涛1, 张国畅1,*(), 张顺香1,2   

  1. 1. 安徽理工大学计算机科学与工程学院, 安徽 淮南 232001
    2. 合肥综合性国家科学中心人工智能研究院, 安徽 合肥 230000
  • 收稿日期:2023-10-30 出版日期:2024-10-15 发布日期:2024-03-06
  • 通讯作者: 张国畅
  • 基金资助:
    国家自然科学基金面上项目(62076006); 安徽省高校协同创新项目(GXXT-2021-008)

Document-Level Relation Extraction Based on Dual-Granularity Graphs

LIAO Tao1, ZHANG Guochang1,*(), ZHANG Shunxiang1,2   

  1. 1. School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, Anhui, China
    2. Artificial Intelligence Research Institute of Hefei Comprehensive National Science Center, Hefei 230000, Anhui, China
  • Received:2023-10-30 Online:2024-10-15 Published:2024-03-06
  • Contact: ZHANG Guochang

摘要:

文档级关系抽取是指在非结构性文本中抽取实体对之间的关系。针对当前文档级关系抽取方法未能充分利用文档语义信息且难以处理文档的噪声干扰问题, 提出一种基于双粒度文档图的关系抽取模型, 采用一种新型的构图思路以及降噪方法, 分别在句间和句内两个层面进行设计。首先, 在句间层面使用修辞语篇关系实体提及关系构建修辞语篇关系图RST-graph, 采用异步降噪方式生成粗粒度文档图(CGD-graph), 缓解了因实体对的句间关系路径长于句内关系路径造成的结构性误剪枝问题。然后, 在句内层面采用依存句法关系对文档中的句子进行解析, 构造依存句法树(SDT), 增强句内语义信息。最后, 将SDT和CGD-graph中存在的公共锚点相连接, 构造细粒度文档图(FGD-graph)。实验结果表明, 与去噪图推理(DGI)模型相比, 该模型的lgn F1值和F1值分别提升了0.40和0.51个百分点, 并且在实体对的多标签关系上随着标签数量的增多抽取效果提升较为显著。

关键词: 文档级, 关系抽取, 双粒度文档图, 异步降噪, 修辞语篇关系, 依存句法关系

Abstract:

This study proposes a document-level relation extraction model that addresses the insufficient utilization of document semantics and difficulty in handling noise in unstructured text. The model is based on dual-granularity document graphs and employs a novel graph construction approach along with a noise reduction technique designed at both the inter- and intra-sentence levels. At the inter-sentence level, a rhetorical discourse relation graph, RST-graph, is constructed using rhetorical discourse and entity mention relations, and a Coarse-Grained Document graph (CGD-graph) is generated using an asynchronous noise reduction method. This approach prevents structural mispruning caused by longer inter-sentence relation paths compared with intra-sentence paths. At the intra-sentence level, dependency syntax relations are used to parse sentences in a document, forming a Dependency Syntax Tree (DST) to enhance intra-sentence semantic information. Finally, the DST is connected to the common anchor points in the CGD-graph to form a Fine-Grained Document graph (FGD-graph). Experimental results indicate that compared with the Denoising Graph Inference (DGI) model, the proposed model improves the lgn F1 and F1 value by 0.40 and 0.51 percentage points, respectively. Additionally, it demonstrates a significant improvement in extracting multi-label relations as the number of labels increases.

Key words: document-level, relation extraction, dual-granularity document graph, asynchronous noise reduction, rhetorical discourse relation, dependency syntax relation