作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (12): 50-52. doi: 10.3969/j.issn.1000-3428.2011.12.017

• 软件技术与数据库 • 上一篇    下一篇

基于改进DBSCAN算法的文本聚类

蔡 岳,袁津生   

  1. (北京林业大学信息学院,北京 100083)
  • 收稿日期:2010-12-10 出版日期:2011-06-20 发布日期:2011-06-20
  • 作者简介:蔡 岳(1984-),男,硕士,主研方向:搜索引擎,网络安全;袁津生,教授

Text Clustering Based on Improved DBSCAN Algorithm

CAI Yue, YUAN Jin-sheng   

  1. (School of Information, Beijing Forestry University, Beijing 100083, China)
  • Received:2010-12-10 Online:2011-06-20 Published:2011-06-20

摘要: 目前多数聚类算法不能很好地适应文本聚类的快速自适应需求。为此,论述DBSCAN算法的基本原理和实现过程,提出一种基于改进DBSCAN算法的文本聚类算法,利用最小二乘法降低文本向量的维度,并创建一种应用于DBSCAN算法的簇关系树结构。实验结果表明,该算法能自适应地进行文本聚类,且与DBSCAN相比,准确率较高。

关键词: DBSCAN算法, 文本聚类, 最小二乘法, 簇关系树

Abstract: Most clustering algorithms can not meet the demand of speed and self-adapting about text clustering. In this paper, after fundamental theory and implement are expounded, the idea of creating an algorithm based improved DBSCAN is proposed. The least square method is used for decreasing divisions and the cluster-tree is created to gain a strong self-adapting of the algorithm. According to the data from an experiment mentioned in this paper, the self-adapting algorithm is feasible and involves better performance than DBSCAN.

Key words: DBSCAN algorithm, text clustering, least square method, cluster-tree

中图分类号: