Abstract:
Using the specialty of the unsupervised clustering and the naïve Bayes classification, the paper gives a method that gains results of the text clusters and takes some of results as the training samples of the naïve Bayes classifier and let the trained naïve Bayes classifier reclassify those texts in illegible area of the clustering results. Consequently the method can classify the unlabeled text accurately and also can gain a better result of classification.
Key words:
Text classification,
Unsupervised text clustering,
Naï,
ve Bayes classification,
Euclid distance
摘要: 利用无监督聚类方法和朴素贝叶斯分类的特点,把UC获得的预分类结果作为朴素贝叶斯分类器的训练样本,将处在聚类结果中类属模糊区域的文本交给训练好的朴素贝叶斯分类器再进行分类,实现了对不带任何类别标记文本的准确分类,可得到较准确的分类结果。
关键词:
文本分类,
无监督文本聚类,
朴素贝叶斯分类,
欧氏距离
CLC Number:
JIANG Zhifang; ZHU Cuiling; WU Qiang. Method of Unlabeled Texts Classification[J]. Computer Engineering, 2007, 33(12): 96-98.
蒋志方;祝翠玲;吴 强. 一个对不带类别标记文本进行分类的方法[J]. 计算机工程, 2007, 33(12): 96-98.