作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 100-111. doi: 10.19678/j.issn.1000-3428.0069745

• 人工智能与模式识别 • 上一篇    下一篇

面向铜基复合材料文献的复杂实体关系抽取方法

郭桦宜1, 游进国1,*(), 耿齐祁1, 陶静梅2, 易健宏2   

  1. 1. 昆明理工大学信息工程与自动化学院,云南 昆明 650500
    2. 昆明理工大学材料科学与工程学院,云南 昆明 650500
  • 收稿日期:2024-04-15 修回日期:2024-05-21 出版日期:2025-11-15 发布日期:2024-08-21
  • 通讯作者: 游进国
  • 基金资助:
    国家自然科学基金(62062046)

Complex Entity Relation Extraction Method for Copper-Based Composite Material Literatures

GUO Huayi1, YOU Jinguo1,*(), GENG Qiqi1, TAO Jingmei2, YI Jianhong2   

  1. 1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunan, China
    2. School of Materials Science and Engineering, Kunming University of Science and Technology, Kunming 650500, Yunan, China
  • Received:2024-04-15 Revised:2024-05-21 Online:2025-11-15 Published:2024-08-21
  • Contact: YOU Jinguo

摘要:

从铜基复合材料文献中有效抽取实体和关系对构建材料知识图谱并推动材料科学研究有重要作用。由于该领域语料的实体构成复杂(如嵌套实体和非连续实体),且大量存在单实体重叠(SEO)关系,现有的实体关系抽取技术难以直接适用。为此,构建一个铜基复合材料实体关系抽取数据集,并提出一种两阶段实体关系抽取方法。第一阶段通过融合词间关系分类任务以及双向门控循环单元(BiGRU)和多粒度扩张卷积技术,提升了实体识别模型对实体跨度的识别能力。第二阶段在文本序列中标注实体信息,并在关系分类模型中引入实体类型注意力机制,以多特征表示来增强关系分类性能。在Matscholar、SOFC、MSP 3个公开数据集以及自建CBCM-IE数据集上的实验结果表明,该方法在精确率、召回率和F1值上相较基线方法平均有5.91、3.56和3.63百分点的提升,抽取性能较优。

关键词: 命名实体识别, 关系抽取, 预训练语言模型, 铜基复合材料

Abstract:

Extracting entities and relations with precision from the copper-based composite material literature is imperative for constructing knowledge graphs and propelling research in materials science. The complex nature of entities in this domain, such as nested and discontinuous entities, along with the prevalence of Single Entity Overlap (SEO) relations, renders existing techniques for entity and relation extraction inadequate. To address this issue, this study presents a dedicated dataset for entity relation extraction from copper-based composite materials and introduces a novel two-stage extraction method. The initial phase combines inter-word relation classification with Bidirectional Gated Recurrent Unit (BiGRU) and multi-scale dilated convolutional networks, thereby augmenting the model's capacity to discern entity boundaries. The second phase involves annotating entity spans within text sequences and incorporating an entity type attention mechanism into a relation classification model. This method leverages multifaceted feature representation to classify relations. On three established public datasets—Matscholar, SOFC, and MSP—as well as the CBCM-IE dataset curated for this research, the proposed method outperforms baseline methodologies with improvements of 5.91 (Precision), 3.56 (Recall), and 3.63 (F1 score) percentage points, demonstrating its efficacy for entity relation extraction in the context of copper-based composite materials.

Key words: named entity recognition, relation extraction, pretrained language model, copper-based composite material