作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (17): 23-25. doi: 10.3969/j.issn.1000-3428.2011.17.006

• 软件技术与数据库 • 上一篇    下一篇

二分K均值聚类算法优化及并行化研究

张军伟1,王念滨1,黄少滨1,蔄世明2   

  1. (1. 哈尔滨工程大学计算机科学与技术学院,哈尔滨 150001;2. 哈尔滨工业大学计算机科学与技术学院,哈尔滨 150001)
  • 收稿日期:2011-03-18 出版日期:2011-09-05 发布日期:2011-09-05
  • 作者简介:张军伟(1971-),男,硕士研究生,主研方向:并行数据处理;王念滨、黄少滨,教授、博士;蔄世明,硕士
  • 基金资助:
    国家自然科学基金资助项目(60973028);国家科技支撑计划基金资助项目(2009BAH42B02)

Research on Bisecting K-Means Clustering Algorithm Optimization and Parallelism

ZHANG Jun-wei  1, WANG Nian-bin  1, HUANG Shao-bin  1, MAN Shi-ming  2   

  1. (1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China; 2. College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
  • Received:2011-03-18 Online:2011-09-05 Published:2011-09-05

摘要: 二分K均值聚类算法在二分聚类过程中的初始质心选取速度方面存在不足。为此,提出以极大距离点作为二分聚类初始质心的思想,提升算法的运行速度。研究如何在群集系统中进行快速聚类,根据二分K均值聚类算法的特性,采用数据并行的思想和均匀划分的策略,对算法进行并行化处理。实验结果表明,改进后的算法能获得比较理想的加速比和较高的使用效率。

关键词: 数据挖掘, 聚类算法, 二分K均值, 并行化, 群集系统

Abstract: Considering the insufficiency of clustering speed which exists in the selecting the initial centroid of Bisecting K-Means(BKM) clustering algorithm, the idea of selecting the two patterns with distance maximum as the initial cluster centroid is implemented. An in-depth study and analysis is carried out on how to accelerate clustering in clustering system. According to the characteristics of BKM, the parallelism algorithm based on data parallelism and symmetric data-partition is put forward. Experimental results show that the improvement of algorithm gets ideal speedup performance and efficiency.

Key words: data mining, clustering algorithm, Bisecting K-Means(BKM), parallelism, clustering system

中图分类号: