摘要: 现有领域本体概念上下位关系抽取方法受到手工标注和特定模式的限制。针对该问题,提出一种基于层叠条件随机场的领域本体概念上下位关系抽取方法。以自由文本为抽取对象,采用两层条件随机场算法,将训练数据处理成条件随机场能识别的线性结构。低层条件随机场模型考虑词之间的长距离依赖,对词进行建模,识别出领域概念并对概念进行顺序组合,结合模板定义特征得到概念对;高层模型对成对概念进行上下位语义标注,识别出领域本体概念之间的上下位关系。采用真实语料进行实验,结果表明,该方法具有较好的识别效果。
关键词:
层叠条件随机场,
领域本体概念,
上下位关系,
概念对,
关系抽取
Abstract: Existed hyponymy extraction methods of domain ontology concept are limited by manual annotation and specific patterns. Aiming at this problem, this paper proposes a hyponymy extraction method of domain ontology concept based on Cascaded Conditional Random Field(CCRF). It uses free text as extracting object, adopts two layers conditional random fields identifying the domain concepts. In lower-level conditional random fields model, it considers long distance dependence between words, makes modeling for words, and extracts concept in sequential, then obtains the concept with the characteristics of the template definition; In high-level model, it annotates semantic in pairs of concepts with hyponymy, identifies the hyponymy relation between domain ontology concept. Through real corpus open testing, the experimental results demonstrate the proposed method performs better.
Key words:
Cascaded Conditional Random Field(CCRF),
domain ontology concept,
hyponymy,
concept pair,
relation extraction
中图分类号:
莫媛媛,郭剑毅,余正涛,蒋年树,线岩团. 基于CCRF的领域本体概念上下位关系抽取[J]. 计算机工程.
MO Yuan-yuan, GUO Jian-yi, YU Zheng-tao, JIANG Nian-shu, XIAN Yan-tuan. Hyponymy Extraction of Domain Ontology Concept Based on CCRF[J]. Computer Engineering.