摘要: 针对k-means 算法的聚类结果高度依赖初始聚类中心选取的问题,提出一种基于改进粒子群优化的文本 聚类算法。分析粒子群算法和k-means 算法的特点,针对粒子群算法搜索精度不高、易陷入局部最优且早熟收敛的 缺点,设计自调节惯性权重机制及云变异算子以改进粒子群算法。自调节惯性权重机制根据种群进化程度,动态 地调节惯性权重,云变异算子基于云模型的随机性和稳定性,采用全局最优值实现粒子的变异。该算法结合了粒 子群算法较强的全局搜索能力与k-means 算法较强的局部搜索能力。每个粒子是一组聚类中心,类内离散度之和 的倒数是适应度函数。实验结果表明,该算法是一种精确而又稳定的文本聚类算法。
关键词:
粒子群优化,
自调节惯性权重机制,
进化程度,
云变异算子,
k-means 算法,
文本聚类
Abstract: Clustering result of k-means clustering algorithm is highly dependent on the choice of the initial cluster
center. With regards to this,a text clustering algorithm based on improved Particle Swarm Optimization ( PSO) is presented. Features of particle swarm algorithm and k-means algorithm are analysed. Considering the disadvantages of PSO including low solving precisions, high possibilities of being trapped in local optimization and premature convergence,self-regulating mechanism of inertia weight and cloud mutation operator are designed to improve PSO. Selfregulating mechanism of inertia weight adjusts the inertia weight dynamically according to the degree of the population evolution. Cloud mutation operator is based on stable tendency and randomness property of cloud model. The global best individual is used to complete mutation on particles. Those two algorithms are combined by taking advantages of power global search ability of PSO and strong capacity of local search of k-means. A particle is a group of clustering centers,and a sum of scatter within class is fitness function. Experimental results show that this algorithm is an accurate,efficient and
stable text clustering algorithm.
Key words:
Particle Swarm Optimization (PSO),
self-regulating mechanism of inertia weight,
degree of evolution,
cloud mutation operator,
k-means algorithm,
text clustering
中图分类号:
王永贵,林琳,刘宪国. 基于改进粒子群优化的文本聚类算法研究[J]. 计算机工程.
WANG Yonggui,LIN Lin,LIU Xianguo. Research on Text Clustering Algorithm Based on Improved Particle Swarm Optimization[J]. Computer Engineering.