作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (20): 83-85. doi: 10.3969/j.issn.1000-3428.2009.20.029

• 软件技术与数据库 • 上一篇    下一篇

基于质心的文本分类算法

柴玉梅1,朱国重1,昝红英1,胡达明2,冼家扬2   

  1. (1. 郑州大学信息工程学院,郑州 450001;2. 香港慧科讯业有限公司,香港)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-10-20 发布日期:2009-10-20

Text Categorization Algorithm Based on Centroid

CHAI Yu-mei1, ZHU Guo-zhong1, ZAN Hong-ying1, HU Da-ming2, XIAN Jia-yang2   

  1. (1. College of Information Engineering, Zhengzhou University, Zhengzhou 450001; 2. Wisers Information Limited, Hong Kong)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-10-20 Published:2009-10-20

摘要: 当文本集较分散或出现多峰值时,基于质心的文本分类算法分类效果很差。针对该问题提出一种改进的文本分类算法,与基于质心的经典分类算法相比,其性能较高。在香港慧科讯业公司提供的文本分类语料库上的测试结果表明,该算法的效率和精度满足要求。

关键词: 文本分类, 质心, K近邻

Abstract: The performance of text categorization algorithm based on centroid is poor when the documents are dispersive or existing more than one peak value. Aiming at this problem, this paper proposes an improved text categorization algorithm whose performance is higher than classical categorization algorithm based on centroid. Experimental results in the documents set provided by Wisers Information Limited show that this algorithm can obtain satisfactory efficiency and precision.

Key words: text categorization, centroid, K-Nearest Neighbor(KNN)

中图分类号: