作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (22): 81-82.

• 软件技术与数据库 • 上一篇    下一篇

海量中文短信文本密度聚类研究

周 泓,刘金岭   

  1. (淮阴工学院计算机工程学院,江苏 淮安 233003)
  • 出版日期:2010-11-20 发布日期:2010-11-18
  • 作者简介:周 泓(1980-),女,硕士研究生,主研方向:数据仓库,文本数据挖掘;刘金岭,教授

Study on Mass Chinese Short Message Text Density Clustering

ZHOU Hong, LIU Jin-ling   

  1. (Faculty of Computer Engineering, Huaiyin Institute of Technology, Huaian 223003, China)
  • Online:2010-11-20 Published:2010-11-18

摘要:

根据短信文本的特性,给出一种基于密度的中文短信聚类的方法,该方法将文本数据中具有高密度的区域划分为簇,构造一个可达相似度的升序排列的种子队列存储待扩张的短信文本,选择大阈值相似度可达的对象,即快速定位稠密空间的文本对象使较高密度的簇优先完成。实验结果表明,该聚类方法比K-means提高10倍左右的效率。

关键词: 密度, 簇, 邻域, 短信文本, 聚类

Abstract:

According to the characteristics of short message text, a clustering method of the Chinese message based on density is given. High-density region of the text data is divided into clusters and a seed queue is constructed, which is arranged in ascending order of the reachable similarity, to store the text of short message text to be expanded. The text message is disposed in a specific order. In order to make higher-density clusters to complete first, the object is selected according to a greater threshold similarity, namely that the dense space text object which can be rapidly located makes the high-density cluster complete first. Experimental result shows that this clustering method’s efficiency is increased 10 times of K-means method.

Key words: density, cluster, neighborhood, short message text, clustering

中图分类号: