Abstract:
This paper presents an unsupervised robust text clustering method based on niching genetic algorithm in which text clustering in feature space is transformed into a multimodal function optimization problem within the context of genetic niching. The peaks of multimodal function, which constitute the final text clustering centers, are identified based on improved deterministic crowding. Fitness function is constructed in terms of density estimator of data points. Niching radius can be dynamically adjusted by using an iterative hill-climbing method coupling with genetic optimization of the text cluster centers. As a result, the number of text clusters can be adaptively obtained. The experimental results show that the algorithm is effective and efficient in dealing with the problem of text clustering.
Key words:
Deterministic crowding; Text clustering; Multimodal function; Density estimator
摘要: 针对传统c-均值等算法在文本聚类中的缺陷,提出了一种基于小生境遗传算法的中文文本聚类新方法,将文本集的聚类问题转化为多峰函数的优化问题。以多峰函数的峰值代表文本的聚类中心,聚类的数目不必预先给定。描述了该聚类方法实现文本聚类时适应值函数的构造方法以及小生境半径的动态估计方法。实验结果表明,该方法提高了文本聚类的平均准确率。
关键词:
排挤小生境;文本聚类;多峰函数;密度估计
ZHAO Yaqin, ZHOU Xianzhong. A Novel Text Clustering Algorithm Based on Niching Technique[J]. Computer Engineering, 2006, 32(6): 206-208.
赵亚琴,周献中. 一种基于小生境遗传算法的中文文本聚类新方法[J]. 计算机工程, 2006, 32(6): 206-208.