Abstract:
According to the characteristics of Chinese short message text, this paper presents a clustering process based on algorithms of the optimal number, which can be obtained multiple statistical information by scaning the data only once, and using increment in-depth profile analysis so as to obtain corresponding total number of class cluster, obtained optimal solution. Experimental result shows that the method has advantage with highly-quality than other methods.
Key words:
clustering,
cluster number,
increment,
division
摘要: 针对海量中文短信文本的聚类簇数的确定问题,提出一种基于聚类过程的短信文本最佳聚类数确定方法。通过扫描一遍数据即可获得多个统计信息,利用增量逐层划分得到最优划分所对应的簇类数,求出最优解。实验结果表明,与其他方法相比,该方法的分类效率较高。
关键词:
聚类,
簇数,
增量,
划分
CLC Number:
LIU Jin-ling. Study on Optimal Clustering Number in Mass Chinese Short Message Text[J]. Computer Engineering, 2010, 36(8): 66-68.
刘金岭. 海量中文短信文本最佳聚类数研究[J]. 计算机工程, 2010, 36(8): 66-68.