作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (17): 65-67,7. doi: 10.3969/j.issn.1000-3428.2008.17.024

• 软件技术与数据库 • 上一篇    下一篇

基于密度的分布式聚类算法研究

郑金彬1,卓义宝2   

  1. (1. 龙岩学院数学与计算机科学学院,龙岩 364000;2. 厦门大学计算机科学系,厦门 361005)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-09-05 发布日期:2008-09-05

Research of Distributed Clustering Algorithm Based on Density

ZHENG Jin-bin1, ZHUO Yi-bao2   

  1. (1. College of Mathematics and Computer Science, Longyan University, Longyan 364000; 2. Department of Computer Science, Xiamen University, Xiamen 361005)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-09-05 Published:2008-09-05

摘要: 大量复杂异构数据分布于各个网络站点上,分布式聚类是海量数据处理的一个重要应用。该文针对基于密度的分布式聚类(DBDC)算法提出一种改进算法,利用局部聚类获取更佳的代表对象,将代表对象集附带相关信息传送至主站点,用增强的基于密度的聚类算法进行全局聚类,并更新子站点聚类。理论分析和实验结果表明,该算法在聚类质量和算法效率方面优于DBDC算法。

关键词: 数据挖掘, 分布式聚类, 特殊核心对象

Abstract: Large amounts of heterogeneous complex data reside on different computers connected to each other by networks. Distributed clustering is an important implementation of large data process. Based on the Density Based Distribute Clustering(DBDC) algorithm, this paper proposes an improved algorithm. It gets better representatives bojects by local clustering, and sends these representatives with some other correlative information to the main computer. They are clustered with an enhanced clustering algorithm based on density. The clustering of sub computer is updated. Theoretical analysis and experimental results testify that this algorithm outperforms DBDC in both clustering quality and efficiency.

Key words: data mining, distributed clustering, specific core objects

中图分类号: