作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于依存分析的开放式中文实体关系抽取方法

李明耀1,2,杨静1,2   

  1. (1.上海市多维度信息处理重点实验室,上海 200241; 2.华东师范大学 计算机科学技术系,上海 200241)
  • 收稿日期:2015-05-25 出版日期:2016-06-15 发布日期:2016-06-15
  • 作者简介:李明耀(1989-),男,硕士研究生,主研方向为数据挖掘、信息抽取;杨静,副教授、博士。
  • 基金资助:
    上海市科委基金资助项目(14511107000)。

Open Chinese Entity Relation Extraction Method Based on Dependency Parsing

LI Mingyao 1,2,YANG Jing 1,2   

  1. (1.Shanghai Key Laboratory of Multidimensional Information Processing,Shanghai 200241,China; 2.Department of Computer Science and Technology,East China Normal University,Shanghai 200241,China)
  • Received:2015-05-25 Online:2016-06-15 Published:2016-06-15

摘要: 实体关系抽取是信息抽取的组成部分,其目标是确定实体之间是否存在某种语义关系。由于中文语法错综复杂、表达方式灵活、语义多样等固有性质的限制,导致在中文中以动词作为关系表述容易引起实体间的关系含糊不清。为此,利用依存分析,提出一种开放式中文实体关系抽取方法。对输入的单句进行依存分析,通过依存分析输出的依存弧判断单句是否为动词谓语句,如果是动词谓语句则结合中文语法启发式规则抽取关系表述。根据距离确定论元位置,对三元组进行评估,输出符合条件的三元组。在SogouCA和SogouCS语料库上的实验结果表明,提出的方法适用于大规模语料库,具有较好的性能与可移植性。与基于卷积树核的无监督层次聚类方法相比,F值提高了16.68%。

关键词: 开放式信息抽取, 中文实体关系抽取, 依存分析, 无监督, 启发式规则

Abstract: Entity relation extraction is a part of the Information Extraction(IE).Its objective refers to determining whether there is a kind of semantic relationship between entities.To break the limitations of complex Chinese grammar,flexible expression and various semantic,which results in the vague relationship between entities simply using verbs as relational expressions in Chinese,this paper presents an open Chinese entity relation extraction method using dependency parsing.This method first does dependency parsing to the input sentence.Whether it is verb predicate sentence can be judged through the dependency arc by dependency parsing.If it is verb predicate sentence,relationship expression can be extracted combined with Chinese grammar heuristic rule.The location of the argument is determind according to the distance,evaluating the triples and outputting these qualified triples.Experimental results on SogouCA and SogouCS corpus show that the proposed method is suitable for large-scale corpus,and has good performance and portability.Contrast with unsupervised clustering method based on kernel tree,F-measure is increased by 16.68%.

Key words: Open Information Extraction(OIE), Chinese entity relation extraction, dependency parsing, unsupervised, heuristic rule

中图分类号: