作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (6): 157-158. doi: 10.3969/j.issn.1000-3428.2011.06.054

• 人工智能及识别技术 • 上一篇    下一篇

用于不平衡数据分类的FE-SVDD算法

方景龙 1,2,王万良 1,何伟成 2   

  1. (1. 浙江工业大学计算机科学与技术学院,杭州 310023;2. 杭州电子科技大学图形图像研究所,杭州 310018)
  • 出版日期:2011-03-20 发布日期:2011-03-29
  • 作者简介:方景龙(1964-),男,研究员,主研方向:机器学习,目标探测;王万良,教授、博士;何伟成,硕士研究生
  • 基金资助:

    国家自然科学基金资助项目(60874074);浙江省科技计划基金资助重点项目(2009C14032)

FE-SVDD Algorithm for Imbalanced Data Classification

FANG Jing-long 1,2, WANG Wan-liang 1, HE Wei-cheng 2   

  1. (1. School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China; 2. Institute of Graphic and Image, Hangzhou Dianzi University, Hangzhou 310018, China)
  • Online:2011-03-20 Published:2011-03-29

摘要:

现有的支持向量数据描述(SVDD)算法在解决不平衡数据集问题时通常存在有偏性。针对该问题,在研究PCA特征提取技术和SVDD分类理论的基础上,提出一种用于平衡数据分类的FE-SVDD算法。该方法对2类样本数据进行主成分分析,分别求出主要特征值,根据样本容量及特征值对SVDD中的 值重新定义。在人工样本集和UCI数据集上进行实验,结果验证了该方法的有效性。

关键词: 模式分类, 支持向量数据描述, 不平衡数据集, 特征提取, 主成分分析

Abstract:

It usually exists bias when existing Support Vector Data Description(SVDD) algorithm solves the problem of imbalanced data sets. Aiming at this problem, this paper proposes FE-SVDD algorithm with improved imbalanced data classification. The feature extraction method based on Principal Component Analysis(PCA) is introduced. In this algorithm, the principal values are found respectively of the two classes of samples by using PCA. The penalty is given based on the information provided by the sizes of the two sample data and their values. It verifies the C of SVDD algorithm using artificial data and UCI datasets for the data imbalanced classification problem. Experiment results on artificial data sets and UCI data sets show the method’s effectiveness.

Key words: pattern classification, Support Vector Data Description(SVDD), imbalanced data sets, feature extraction, Principal Component Analysis(PCA)

中图分类号: