摘要:
现有的支持向量数据描述(SVDD)算法在解决不平衡数据集问题时通常存在有偏性。针对该问题,在研究PCA特征提取技术和SVDD分类理论的基础上,提出一种用于平衡数据分类的FE-SVDD算法。该方法对2类样本数据进行主成分分析,分别求出主要特征值,根据样本容量及特征值对SVDD中的 值重新定义。在人工样本集和UCI数据集上进行实验,结果验证了该方法的有效性。
关键词:
模式分类,
支持向量数据描述,
不平衡数据集,
特征提取,
主成分分析
Abstract:
It usually exists bias when existing Support Vector Data Description(SVDD) algorithm solves the problem of imbalanced data sets. Aiming at this problem, this paper proposes FE-SVDD algorithm with improved imbalanced data classification. The feature extraction method based on Principal Component Analysis(PCA) is introduced. In this algorithm, the principal values are found respectively of the two classes of samples by using PCA. The penalty is given based on the information provided by the sizes of the two sample data and their values. It verifies the C of SVDD algorithm using artificial data and UCI datasets for the data imbalanced classification problem. Experiment results on artificial data sets and UCI data sets show the method’s effectiveness.
Key words:
pattern classification,
Support Vector Data Description(SVDD),
imbalanced data sets,
feature extraction,
Principal Component Analysis(PCA)
中图分类号:
方景龙, 王万良, 何伟成. 用于不平衡数据分类的FE-SVDD算法[J]. 计算机工程, 2011, 37(6): 157-158.
FANG Jing-Long, WANG Mo-Liang, HE Wei-Cheng. FE-SVDD Algorithm for Imbalanced Data Classification[J]. Computer Engineering, 2011, 37(6): 157-158.