摘要: 为了提高高维数据集合离群数据挖掘效率,该文分析传统的离群数据挖掘算法,提出一种离群点检测算法。该算法将非线性问题转化为高维特征空间中的线性问题,利用核函数-主成分进行维数约减,逐个扫描数据对象的投影分量,判断数据点是否为离群点,适用于线性可分数据集的离群点、线性不可分数据集的离群点的检测。实验表明了该算法的优越性。
关键词:
维数消减,
核函数,
主成分
Abstract: The data dimension reduction is a method that can enhance the outliers mining efficiency based on higher-dimension data set. This paper analyzes classical outlier mining algorithm, proposes a novel outlier detection algorithm, transforms nonlinear large-scale data into linear data in the feature space, and introduces a kernel function and principal component data transformation to reduce data dimension. On the basis of each resulting vector, it is determined which data is outlier data one by one. This paper shows that the algorithm is used to detect linear separable outlier data, and to detect nonlinear inseparable outlier data. Experimental results indicate that the algorithm is predominant.
Key words:
dimension reduction,
kernel function,
principal component
中图分类号:
徐雪松;刘耀宗;赵学龙;张 宏;刘凤玉. 基于核函数-主成分维数约减的离群点检测[J]. 计算机工程, 2008, 34(8): 82-84.
XU Xue-song; LIU Yao-zong; ZHAO Xue-long; ZHANG Hong; LIU Feng-yu. Outliers Detection Based on Kernel Function-Principle Component Dimension Reduction[J]. Computer Engineering, 2008, 34(8): 82-84.