摘要:
根据短信文本的特性,给出一种基于密度的中文短信聚类的方法,该方法将文本数据中具有高密度的区域划分为簇,构造一个可达相似度的升序排列的种子队列存储待扩张的短信文本,选择大阈值相似度可达的对象,即快速定位稠密空间的文本对象使较高密度的簇优先完成。实验结果表明,该聚类方法比K-means提高10倍左右的效率。
关键词:
密度,
簇,
邻域,
短信文本,
聚类
Abstract:
According to the characteristics of short message text, a clustering method of the Chinese message based on density is given. High-density region of the text data is divided into clusters and a seed queue is constructed, which is arranged in ascending order of the reachable similarity, to store the text of short message text to be expanded. The text message is disposed in a specific order. In order to make higher-density clusters to complete first, the object is selected according to a greater threshold similarity, namely that the dense space text object which can be rapidly located makes the high-density cluster complete first. Experimental result shows that this clustering method’s efficiency is increased 10 times of K-means method.
Key words:
density,
cluster,
neighborhood,
short message text,
clustering
中图分类号:
周泓, 刘金岭. 海量中文短信文本密度聚类研究[J]. 计算机工程, 2010, 36(22): 81-82.
ZHOU Hong, LIU Jin-Ling. Study on Mass Chinese Short Message Text Density Clustering[J]. Computer Engineering, 2010, 36(22): 81-82.