计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于改进粒子群优化的文本聚类算法研究

王永贵,林 琳,刘宪国   

  1. (辽宁工程技术大学软件学院,辽宁葫芦岛125105)
  • 收稿日期:2013-10-29 出版日期:2014-11-15 发布日期:2014-11-13
  • 作者简介:王永贵(1967 - ),男,教授,主研方向:智能计算,云计算,数据挖掘;林 琳,硕士;刘宪国,讲师。
  • 基金项目:
    国家自然科学基金资助项目(60903082);辽宁省教育厅基金资助项目(L2012113)。

Research on Text Clustering Algorithm Based on Improved Particle Swarm Optimization

WANG Yonggui,LIN Lin,LIU Xianguo   

  1. (College of Software,Liaoning Technical University,Huludao 125105,China)
  • Received:2013-10-29 Online:2014-11-15 Published:2014-11-13

摘要: 针对k-means 算法的聚类结果高度依赖初始聚类中心选取的问题,提出一种基于改进粒子群优化的文本 聚类算法。分析粒子群算法和k-means 算法的特点,针对粒子群算法搜索精度不高、易陷入局部最优且早熟收敛的 缺点,设计自调节惯性权重机制及云变异算子以改进粒子群算法。自调节惯性权重机制根据种群进化程度,动态 地调节惯性权重,云变异算子基于云模型的随机性和稳定性,采用全局最优值实现粒子的变异。该算法结合了粒 子群算法较强的全局搜索能力与k-means 算法较强的局部搜索能力。每个粒子是一组聚类中心,类内离散度之和 的倒数是适应度函数。实验结果表明,该算法是一种精确而又稳定的文本聚类算法。

关键词: 粒子群优化, 自调节惯性权重机制, 进化程度, 云变异算子, k-means 算法, 文本聚类

Abstract: Clustering result of k-means clustering algorithm is highly dependent on the choice of the initial cluster center. With regards to this,a text clustering algorithm based on improved Particle Swarm Optimization ( PSO) is presented. Features of particle swarm algorithm and k-means algorithm are analysed. Considering the disadvantages of PSO including low solving precisions, high possibilities of being trapped in local optimization and premature convergence,self-regulating mechanism of inertia weight and cloud mutation operator are designed to improve PSO. Selfregulating mechanism of inertia weight adjusts the inertia weight dynamically according to the degree of the population evolution. Cloud mutation operator is based on stable tendency and randomness property of cloud model. The global best individual is used to complete mutation on particles. Those two algorithms are combined by taking advantages of power global search ability of PSO and strong capacity of local search of k-means. A particle is a group of clustering centers,and a sum of scatter within class is fitness function. Experimental results show that this algorithm is an accurate,efficient and stable text clustering algorithm.

Key words: Particle Swarm Optimization (PSO), self-regulating mechanism of inertia weight, degree of evolution, cloud mutation operator, k-means algorithm, text clustering

中图分类号: