Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2011, Vol. 37 ›› Issue (12): 50-52. doi: 10.3969/j.issn.1000-3428.2011.12.017

• Networks and Communications • Previous Articles     Next Articles

Text Clustering Based on Improved DBSCAN Algorithm

CAI Yue, YUAN Jin-sheng   

  1. (School of Information, Beijing Forestry University, Beijing 100083, China)
  • Received:2010-12-10 Online:2011-06-20 Published:2011-06-20

基于改进DBSCAN算法的文本聚类

蔡 岳,袁津生   

  1. (北京林业大学信息学院,北京 100083)
  • 作者简介:蔡 岳(1984-),男,硕士,主研方向:搜索引擎,网络安全;袁津生,教授

Abstract: Most clustering algorithms can not meet the demand of speed and self-adapting about text clustering. In this paper, after fundamental theory and implement are expounded, the idea of creating an algorithm based improved DBSCAN is proposed. The least square method is used for decreasing divisions and the cluster-tree is created to gain a strong self-adapting of the algorithm. According to the data from an experiment mentioned in this paper, the self-adapting algorithm is feasible and involves better performance than DBSCAN.

Key words: DBSCAN algorithm, text clustering, least square method, cluster-tree

摘要: 目前多数聚类算法不能很好地适应文本聚类的快速自适应需求。为此,论述DBSCAN算法的基本原理和实现过程,提出一种基于改进DBSCAN算法的文本聚类算法,利用最小二乘法降低文本向量的维度,并创建一种应用于DBSCAN算法的簇关系树结构。实验结果表明,该算法能自适应地进行文本聚类,且与DBSCAN相比,准确率较高。

关键词: DBSCAN算法, 文本聚类, 最小二乘法, 簇关系树

CLC Number: