摘要: 传统的核主成分分析方法通过不明确的实值函数把原始数据投影到高维空间进行属性约简,增加了搜索分类超平面的时间,降低了分类准确率。为此,提出一种基于再生核Hilbert空间主成分分析的属性约简方法,把原始数据通过明确的连续值函数投影到高维或无限维的再生核空间再进行属性约简。真实数据集实验结果显示,该方法能有效提高分类准确率并减少运行时间。
关键词:
数据挖掘,
属性约简,
希尔伯特空间,
主成分分析
Abstract: The traditional Kernel Principal Component Analysis(KPCA) method maps original data into high dimensional space via the implicit mappings with real-valued function, such a mapping needs too much time for finding the hyper-plane in the classification assignments as well as leads to lower classification accuracy. Aiming at this problem, this paper maps the input into Reproducing Kernel Hilbert Spaces(RKHS) which are full with continuous values function by the explicit mappings, and then implements dimensionality reduction in RKHS. Experimental results in real text data show the proposed method outperforms the comparison in terms of classification accuracy and running time.
Key words:
data mining,
dimensionality reduction,
Hilbert space,
Principal Component Analysis(PCA)
中图分类号:
黄敢基, 吕跃进. 基于再生核Hilbert空间PCA的属性约简[J]. 计算机工程, 2011, 37(10): 52-54.
HUANG Gan-Ji, LV Ti-Jin. Dimensionality Reduction Based on Principal Component Analysis in Reproducing Kernel Hilbert Space[J]. Computer Engineering, 2011, 37(10): 52-54.