作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (22): 182-184. doi: 10.3969/j.issn.1000-3428.2009.22.062

• 人工智能及识别技术 • 上一篇    下一篇

基于聚类和遗传交叉的少数类样本生成方法

杜 娟,衣治安,周 颖   

  1. (大庆石油学院计算机与信息技术学院,大庆 163318)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-11-20 发布日期:2009-11-20

Generation Method for Samples of Minority Class Based on Clustering and Genetic Crossover

DU Juan, YI Zhi-an, ZHOU Ying   

  1. (Institute of Computer and Information Technology, Daqing Petroleum Institute, Daqing 163318)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-11-20 Published:2009-11-20

摘要: 传统的分类算法在处理不均衡样本数据时,分类器预测倾向于多数类,样本数量少的类别分类误差大。针对该问题,提出一种基于聚类和遗传交叉的少数类样本上采样方法,通过K-means算法将少数类样本聚类分组,在每个聚类内使用遗传交叉获取新样本,并进行有效性验证。基于K-最近邻及支持向量机分类器的实验结果证明了该方法的有效性。

关键词: 不均衡数据集, 分类, 聚类, 遗传交叉

Abstract: Prediction results of classification with traditional classify algorithm are towards the class with more samples when training imbalanced data sets. The classification error of the minority class is grave. Aiming at the problem, this paper proposes an over-sampling method based on clustering and genetic crossover. The samples of minority class are grouped by using K-means clustering algorithm. Genetic crossover algorithm is used in each cluster to gain new samples and confirm the validity. The validity of the method is proved through the experiments of K-Nearest Neighbor(KNN) and Support Vector Machine(SVM) classification.

Key words: imbalanced data set, classification, clustering, genetic crossover

中图分类号: