Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2006, Vol. 32 ›› Issue (6): 206-208.

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

A Novel Text Clustering Algorithm Based on Niching Technique

ZHAO Yaqin1, ZHOU Xianzhong2   

  1. 1. Department of Automation, Nanjing University of Science & Technology, Nanjing 210094;2. School of Management and Engineering, Nanjing University, Nanjing 210093
  • Online:2006-03-20 Published:2006-03-20

一种基于小生境遗传算法的中文文本聚类新方法

赵亚琴1,周献中2   

  1. 1.南京理工大学自动化系,南京210094;2.南京大学工程管理学院,南京 210093

Abstract: This paper presents an unsupervised robust text clustering method based on niching genetic algorithm in which text clustering in feature space is transformed into a multimodal function optimization problem within the context of genetic niching. The peaks of multimodal function, which constitute the final text clustering centers, are identified based on improved deterministic crowding. Fitness function is constructed in terms of density estimator of data points. Niching radius can be dynamically adjusted by using an iterative hill-climbing method coupling with genetic optimization of the text cluster centers. As a result, the number of text clusters can be adaptively obtained. The experimental results show that the algorithm is effective and efficient in dealing with the problem of text clustering.

Key words: Deterministic crowding; Text clustering; Multimodal function; Density estimator

摘要: 针对传统c-均值等算法在文本聚类中的缺陷,提出了一种基于小生境遗传算法的中文文本聚类新方法,将文本集的聚类问题转化为多峰函数的优化问题。以多峰函数的峰值代表文本的聚类中心,聚类的数目不必预先给定。描述了该聚类方法实现文本聚类时适应值函数的构造方法以及小生境半径的动态估计方法。实验结果表明,该方法提高了文本聚类的平均准确率。

关键词: 排挤小生境;文本聚类;多峰函数;密度估计