摘要: 基于核函数的实体关系抽取方法将信息隐含在核函数中,无法辨别有用和无用信息,会引入噪声。为此,提出一种基于子树特征的实体关系抽取方法。利用子树挖掘和特征选择得到有效子树,并将其作为特征模板构造特征向量。在中文语料库上进行的实验结果表明,该方法具有较好的分类效果。
关键词:
实体关系抽取,
短语结构语法,
依存语法,
特征选择,
卡方统计量
Abstract: Kernel methods for relation have the implicit representation of feature spaces which can’t distinguish useful feature from useless. As a result, it introduces noise and affect performance. Aiming at this problem, this paper presents entity relation extraction based on the feature of subtrees. The proposed method uses subtree mining and feature selection to get the more useful subtrees, and the feature vector is constructed on them for categorization. Experimental result in Chinese language database shows that the proposed method for entity relation extraction is effective.
Key words:
entity relation extraction,
phrase structure grammar,
dependency grammar,
feature selection,
Chi-squared statistic
中图分类号:
姚全珠, 王美君, 李如琼. 基于子树特征的中文实体关系抽取[J]. 计算机工程, 2012, 38(01): 48-50,54.
TAO Quan-Zhu, WANG Mei-Jun, LI Ru-Qiong. Chinese Entity Relation Extraction Based on Subtree Feature[J]. Computer Engineering, 2012, 38(01): 48-50,54.