作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (8): 66-68. doi: 10.3969/j.issn.1000-3428.2010.08.023

• 软件技术与数据库 • 上一篇    下一篇

海量中文短信文本最佳聚类数研究

刘金岭   

  1. (淮阴工学院计算机系,淮安 223003)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-04-20 发布日期:2010-04-20

Study on Optimal Clustering Number in Mass Chinese Short Message Text

LIU Jin-ling   

  1. (Department of Computer, Huaiyin Institute of Technology, Huai’an 223003)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-04-20 Published:2010-04-20

摘要: 针对海量中文短信文本的聚类簇数的确定问题,提出一种基于聚类过程的短信文本最佳聚类数确定方法。通过扫描一遍数据即可获得多个统计信息,利用增量逐层划分得到最优划分所对应的簇类数,求出最优解。实验结果表明,与其他方法相比,该方法的分类效率较高。

关键词: 聚类, 簇数, 增量, 划分

Abstract: According to the characteristics of Chinese short message text, this paper presents a clustering process based on algorithms of the optimal number, which can be obtained multiple statistical information by scaning the data only once, and using increment in-depth profile analysis so as to obtain corresponding total number of class cluster, obtained optimal solution. Experimental result shows that the method has advantage with highly-quality than other methods.

Key words: clustering, cluster number, increment, division

中图分类号: