摘要: 传统基于文本的类属关系自动抽取算法只简单记录关系出现的位置、频次等信息,而忽略了大量上下文信息,不能有效辨识典型类属关系。为此,提出一种面向互联网文本典型类属关系的识别方法。通过提取实体概念的语言学特征和上下文语义特征构成实体特征集,基于朴素贝叶斯分类器,计算任意实体属于不同概念的可能性,从而识别典型类属关系。实验结果证明,与基于频率的识别方法相比,该方法能将典型类属关系的识别准确率提高5% 以上。
关键词:
中文知识库,
类属关系,
关系抽取,
典型性,
模式识别,
朴素贝叶斯
Abstract: In a usual way for automatic generic relation extraction from texts,only some simple information,such as
positions and frequency are recorded. And enormous context information is ignored,which is very helpful to recognize typical relationship. A new approach is proposed to recognize typical generic relationship from candidates extracted Internet texts. Abundant semantic information is kept while relations are captured. It integrates both natural language features of entities and concepts to constitute a entity feature set,calculates the possibility of any entities belong to different concepts based on na?ve Bayesian,and recognizes typical generic relationship. Experimental result proves,as for judging whether a generic relation is typical, compared with the frequency-based recognizing method, the method improves the recognition accuracy by more than 5% .
Key words:
Chinese knowledge base,
generic relationship,
relationship extraction,
typicality,
pattern recognition,
naive Bayesian
中图分类号: