Abstract:
K-means algorithm is computationally intensive, time consuming and convergence slow. In order to solve the problem of K-means algorithm, a new set of parallel solution of K-means algorithm is presented. In the General Purpose computation on Graphics Processing Unit(GPGPU) architecture, Compute Unified Device Architecture(CUDA) is used to accelerate K-means algorithm. Based on batch principle, the algorithm uses CUDA’s memory more rationally, to avoid access conflict, reduce the number of times of visits for data sets, and improve the efficiency of K-means algorithm. Experimental result in large-scale data set shows that the algorithm has a faster clustering speed.
Key words:
data mining,
K-means algorithm,
Compute Unified Device Architecture(CUDA),
parallel algorithm,
clustering analysis,
Graphics Processing Unit(GPU)
摘要: 为解决K-means 算法计算量大、收敛缓慢、运算耗时长等问题,给出一种新的K-means算法的并行实现方法。在通用计算图形处理器架构上,使用统一计算设备架构(CUDA)加速K-means算法。采用分批原则,更合理地运用CUDA提供的各种存储器,避免访问冲突,同时减少对数据集的访问次数,以提高算法效率。在大规模数据集中的实验结果表明,该算法具有较快的聚类速度。
关键词:
数据挖掘,
K-means算法,
统一计算设备架构,
并行算法,
聚类分析,
图形处理器
CLC Number:
LAN Yuan-Dong, LIU Yu-Fang, XU Chao. Parallel Implementation of K-means Algorithm with Batch Processing[J]. Computer Engineering, 2012, 38(13): 145-147,151.
兰远东, 刘宇芳, 徐涛. 分批处理的K-means算法并行实现[J]. 计算机工程, 2012, 38(13): 145-147,151.