作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (12): 96-98. doi: 10.3969/j.issn.1000-3428.2007.12.034

• 软件技术与数据库 • 上一篇    下一篇

一个对不带类别标记文本进行分类的方法

蒋志方1,祝翠玲2,吴 强1   

  1. (1. 山东大学计算机科学与技术学院,济南 250061;2. 山东经济学院信息管理学院,济南 250014)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-06-20 发布日期:2007-06-20

Method of Unlabeled Texts Classification

JIANG Zhifang1, ZHU Cuiling2, WU Qiang1   

  1. (1. School of Computer Science and Technology, Shandong University, Jinan 250061; 2. College of Information Management, Shandong Economic University, Jinan 250014)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-06-20 Published:2007-06-20

摘要: 利用无监督聚类方法和朴素贝叶斯分类的特点,把UC获得的预分类结果作为朴素贝叶斯分类器的训练样本,将处在聚类结果中类属模糊区域的文本交给训练好的朴素贝叶斯分类器再进行分类,实现了对不带任何类别标记文本的准确分类,可得到较准确的分类结果。

关键词: 文本分类, 无监督文本聚类, 朴素贝叶斯分类, 欧氏距离

Abstract: Using the specialty of the unsupervised clustering and the naïve Bayes classification, the paper gives a method that gains results of the text clusters and takes some of results as the training samples of the naïve Bayes classifier and let the trained naïve Bayes classifier reclassify those texts in illegible area of the clustering results. Consequently the method can classify the unlabeled text accurately and also can gain a better result of classification.

Key words: Text classification, Unsupervised text clustering, Naï, ve Bayes classification, Euclid distance

中图分类号: