计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于概念聚类的领域本体图中文文本分类

叶施仁,孙宁   

  1. (常州大学 信息科学与工程学院,江苏 常州 213164)
  • 收稿日期:2015-11-16 出版日期:2016-12-15 发布日期:2016-12-15
  • 作者简介:叶施仁(1970—),男,高级工程师、博士,主研方向为数据挖掘;孙宁,硕士研究生。
  • 基金项目:
    国家自然科学基金(61272367)。

Chinese Text Classification by Domain Ontology Graph Based on Concept Clustering

YE Shiren,SUN Ning   

  1. (School of Information Science and Engineering,Changzhou University,Changzhou,Jiangsu 213164,China)
  • Received:2015-11-16 Online:2016-12-15 Published:2016-12-15

摘要: 基于半监督概念聚类技术,提出一种改进的领域本体图中文文本分类算法。根据领域本体图结构模型,创建中文文本分类的本体学习框架,利用HowNet字典实现术语提取并建立中文术语-术语关系映射。依据术语间的权重连接关系,设计二分类关系的KLSeeker本体中文文本分类算法,并通过基于概念聚类的本体图半监督学习,实现中文文本的精确分类。实验结果表明,与基于非负张量分解的中文文本分类算法相比,该算法具有更高的分类精度。

关键词: 词消歧, 半监督, 概念聚类, HowNet字典, 二分类关系, 领域本体图

Abstract: This paper proposes an improved Chinese text classification algorithm by Domain Ontology Graph(DOG)based on semi-supervised concept clustering.According to the DOG structure model,the ontology learning framework of Chinese classification is created,and then the HowNet dictionary is used to extract the word disambiguation,and then the Chinese term-term relationship mapping is established.Based on the weight connection between the terms of the binary classification relationship,here designs the KLSeeker ontology Chinese text classification algorithm.It realizes accurate classification of Chinese text through DOG semi-supervised learning.Experimental results show that the proposed algorithm has a higher classification accuracy compared with the Chinese text classification algorithm based on non-negative tensor decomposition ontology concepts.

Key words: word disambiguation, semi-supervised, concept clustering, HowNet dictionary, binary classification relationship, Domain Ontology Graph(DOG)

中图分类号: