作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (13): 145-147,151. doi: 10.3969/j.issn.1000-3428.2012.13.043

• 人工智能及识别技术 • 上一篇    下一篇

分批处理的K-means算法并行实现

兰远东,刘宇芳,徐 涛   

  1. (惠州学院计算机科学系,广东 惠州 516007)
  • 收稿日期:2011-10-18 出版日期:2012-07-05 发布日期:2012-07-05
  • 作者简介:兰远东(1975-),男,博士研究生,主研方向:模式识别,机器学习;刘宇芳,副教授;徐 涛,博士研究生
  • 基金资助:
    国家“863”先进制造领域基金资助重点项目(2006AA04A120);广东高校优秀青年创新人才培养计划基金资助项目(LYM09128)

Parallel Implementation of K-means Algorithm with Batch Processing

LAN Yuan-dong, LIU Yu-fang, XU Tao   

  1. (Department of Computer Science, Huizhou University, Huizhou 516007, China)
  • Received:2011-10-18 Online:2012-07-05 Published:2012-07-05

摘要: 为解决K-means 算法计算量大、收敛缓慢、运算耗时长等问题,给出一种新的K-means算法的并行实现方法。在通用计算图形处理器架构上,使用统一计算设备架构(CUDA)加速K-means算法。采用分批原则,更合理地运用CUDA提供的各种存储器,避免访问冲突,同时减少对数据集的访问次数,以提高算法效率。在大规模数据集中的实验结果表明,该算法具有较快的聚类速度。

关键词: 数据挖掘, K-means算法, 统一计算设备架构, 并行算法, 聚类分析, 图形处理器

Abstract: K-means algorithm is computationally intensive, time consuming and convergence slow. In order to solve the problem of K-means algorithm, a new set of parallel solution of K-means algorithm is presented. In the General Purpose computation on Graphics Processing Unit(GPGPU) architecture, Compute Unified Device Architecture(CUDA) is used to accelerate K-means algorithm. Based on batch principle, the algorithm uses CUDA’s memory more rationally, to avoid access conflict, reduce the number of times of visits for data sets, and improve the efficiency of K-means algorithm. Experimental result in large-scale data set shows that the algorithm has a faster clustering speed.

Key words: data mining, K-means algorithm, Compute Unified Device Architecture(CUDA), parallel algorithm, clustering analysis, Graphics Processing Unit(GPU)

中图分类号: