作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

一种面向海量中文文本的典型类属关系识别方法

刘 琦,肖仰华,汪 卫   

  1. (复旦大学计算机科学技术学院,上海201203)
  • 收稿日期:2014-03-11 出版日期:2015-02-15 发布日期:2015-02-13
  • 作者简介:刘 琦(1988 - ),男,硕士研究生,主研方向:数据抽取,自然语言处理;肖仰华,副教授;汪 卫,教授、博士生导师。
  • 基金资助:
    国家自然科学基金资助项目(61003001,61170006,6117132,61033010)。

A Recognition Approach of Typical Generic Relationship for Massive Chinese Text

LIU Qi,XIAO Yanghua,WANG Wei   

  1. (School of Computer Science,Fudan University,Shanghai 201203,China)
  • Received:2014-03-11 Online:2015-02-15 Published:2015-02-13

摘要: 传统基于文本的类属关系自动抽取算法只简单记录关系出现的位置、频次等信息,而忽略了大量上下文信息,不能有效辨识典型类属关系。为此,提出一种面向互联网文本典型类属关系的识别方法。通过提取实体概念的语言学特征和上下文语义特征构成实体特征集,基于朴素贝叶斯分类器,计算任意实体属于不同概念的可能性,从而识别典型类属关系。实验结果证明,与基于频率的识别方法相比,该方法能将典型类属关系的识别准确率提高5% 以上。

关键词: 中文知识库, 类属关系, 关系抽取, 典型性, 模式识别, 朴素贝叶斯

Abstract: In a usual way for automatic generic relation extraction from texts,only some simple information,such as positions and frequency are recorded. And enormous context information is ignored,which is very helpful to recognize typical relationship. A new approach is proposed to recognize typical generic relationship from candidates extracted Internet texts. Abundant semantic information is kept while relations are captured. It integrates both natural language features of entities and concepts to constitute a entity feature set,calculates the possibility of any entities belong to different concepts based on na?ve Bayesian,and recognizes typical generic relationship. Experimental result proves,as for judging whether a generic relation is typical, compared with the frequency-based recognizing method, the method improves the recognition accuracy by more than 5% .

Key words: Chinese knowledge base, generic relationship, relationship extraction, typicality, pattern recognition, naive Bayesian

中图分类号: