摘要: 投影寻踪可有效解决文本分类中的维数灾难问题,而投影方向优化是投影寻踪需要解决的关键问题。传统的投影寻踪方法将投影指标优化看作单目标优化问题,会使解的质量受到影响。为此,提出一种基于多目标优化的投影寻踪方法。将类别之间的距离和类别内数据的聚类紧密程度作为2个优化目标,并将投影扩展到多维,利用混沌粒子群优化算法寻找最优的投影方向。在常用文本数据集上进行实验,确定最优投影指标及维度,并比较不同分类模型的分类结果,结果表明,使用该方法能有效提高文本分类性能。
Abstract: Projection pursuit method is increasingly used in text categorization to solve the curse of dimensionality. Traditional projection pursuit method considers the projection index optimization as a single-objective problem rather than a multi-objective one, which will reduce the quality of the solution. To solve this problem, this paper proposes a projection pursuit mehod based on multi-objective optimization. Measures are taken like class difference and difference between the classes as two objectives of pursuit index, the projection pursuit method is extended to multi-dimensional projections, and a Chaotic Particle Swarm Optimization(CPSO) is suggested to find the optimal projection direction. Experiment on commonly used text datasets determines the optimal projection direction and dimensions, and then compares the results of different classification models. The results demonstrate that the proposed method can improve the text categorization performance effectively.
Key words:
projection pursuit,
text categorization,
curse of dimensionality,
projection index,
multi-objective optimization,
Chaotic Particle Swarm Optimization(CPSO) algorithm