Abstract:
The performance of text categorization algorithm based on centroid is poor when the documents are dispersive or existing more than one peak value. Aiming at this problem, this paper proposes an improved text categorization algorithm whose performance is higher than classical categorization algorithm based on centroid. Experimental results in the documents set provided by Wisers Information Limited show that this algorithm can obtain satisfactory efficiency and precision.
Key words:
text categorization,
centroid,
K-Nearest Neighbor(KNN)
摘要: 当文本集较分散或出现多峰值时,基于质心的文本分类算法分类效果很差。针对该问题提出一种改进的文本分类算法,与基于质心的经典分类算法相比,其性能较高。在香港慧科讯业公司提供的文本分类语料库上的测试结果表明,该算法的效率和精度满足要求。
关键词:
文本分类,
质心,
K近邻
CLC Number:
CHAI Yu-mei; ZHU Guo-zhong; ZAN Hong-ying; HU Da-ming; XIAN Jia-yang. Text Categorization Algorithm Based on Centroid[J]. Computer Engineering, 2009, 35(20): 83-85.
柴玉梅;朱国重;昝红英;胡达明;冼家扬. 基于质心的文本分类算法[J]. 计算机工程, 2009, 35(20): 83-85.