作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (04): 67-69. doi: 10.3969/j.issn.1000-3428.2012.04.022

• 软件技术与数据库 • 上一篇    下一篇

一种改进的少数类样本过抽样算法

许丹丹 1,蔡立军 1,王 勇 2   

  1. (1. 西北工业大学理学院,西安 710129;2. 西北工业大学计算机学院,西安 710072)
  • 收稿日期:2011-07-18 出版日期:2012-02-20 发布日期:2012-02-20
  • 作者简介:许丹丹(1984-),女,硕士研究生,主研方向:偏斜数据挖掘;蔡立军、王 勇,副教授
  • 基金资助:
    国家自然科学基金资助项目(60873196)

Improved Over-sampling Algorithm of Minority Class Sample

XU Dan-dan 1, CAI Li-jun 1, WANG Yong 2   

  1. (1.School of Science, Northwestern Polytechnical University, Xi’an 710129, China; 2. School of Computer, Northwestern Polytechnical University, Xi’an 710072, China)
  • Received:2011-07-18 Online:2012-02-20 Published:2012-02-20

摘要: 针对偏斜数据集的分类问题,提出一种改进的少数类样本过抽样算法(B-ISMOTE)。在边界少数类实例及其最近邻实例构成的 n维球体空间内进行随机插值,以此产生虚拟少数类实例,减小数据的不均衡程度。在实际数据集上进行实验,结果证明,与SMOTE算法和B-SMOTE算法相比,B-ISMOTE算法具有较优的分类性能。

关键词: 偏斜数据集, 分类, 过抽样, 虚拟实例, n维球体空间

Abstract: Aiming at the classification of the skewed dataset, this paper proposes an improved over-sampling algorithm of minority class sample, named B-ISMOTE. It improves the data unbalanced distribution of degree through randomized interpolation to produce virtual minority class instances in the sphere space, which constitute of the borderline minority class instances and its nearest neighbor. Experimental results on the real datasets show that compared with SMOTE algorithm and B-SMOTE algorithm, B-ISMOTE algorithm has better classification performance.

Key words: skewed dataset, classification, over-sampling, virtual instance, n dimension sphere space

中图分类号: