作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (16): 224-226. doi: 10.3969/j.issn.1000-3428.2008.16.077

• 人工智能及识别技术 • 上一篇    下一篇

基于随机游走模型和KL-divergence的聚类算法

何会民   

  1. (邯郸学院计算机系,邯郸 056005)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-08-20 发布日期:2008-08-20

Clustering Algorithm Based on Random Walk Model and KL-divergence

HE Hui-min   

  1. (Computer Science Department, Handan College, Handan 056005)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-08-20 Published:2008-08-20

摘要: 聚类分析在数据挖掘领域有着广泛的应用,该文提出一个聚类新思路,它不需要任何参数的假设,只基于数据两两之间的相似性。该方法假设数据点之间存在随机游走关系,根据数据相似性构造随机游走过程的转移矩阵,当随机游走过程进入收敛期后,t阶转移矩阵揭示了数据点的分布。用迭代方法寻找最小的KL-divergence来对这些分布聚类。该方法具有严谨的概率理论基础,避免了传统算法需要参数假设、限于局部最优等不足。实验表明,该算法具有较优的聚类效果。

关键词: 聚类, 随机游走, KL散度

Abstract: Clustering analysis is broadly applied in data mining. This paper presents a new idea in clustering based on pair-wise similarities, and assumes no parametric statistical model. Similarities are transformed to a Markov random walk probability matrix. It is assumed the dataset is under a Markov random walk process. When the process is going into convergence, the t-step transform matrix indicates the distribution of the dataset. It uses iterative algorithm to cluster these data with the goal of decreasing KL-divergence. This method has a solid foundation of probability theory, which can avoid some insufficiency of the traditional algorithms. The experiment shows the algorithm can achieve better results than K-means and mixture models.

Key words: clustering, random walk, KL-divergence

中图分类号: