Abstract:
Aiming at the problems in process of dealing with high dimensional data for traditional SOD outlier detection algorithm, this paper presents an improved one. Through quantifying the aggregation of each dimension, the reference value of each dimension can be fixed, thus reducing the parameter settings impact on algorithm results. Using the relative distance to show the degree of deviation is convenient for detecting outlier in different densities subspace. Simulation results demonstrate the improved algorithm is better than traditional one in detection accuracy.
Key words:
high dimensional data,
subspace,
outlier detection,
data mining
摘要: 针对传统SOD孤立点检测算法在处理高维数据时存在的问题,提出一种改进算法。通过对每一维的聚集度进行量化,确定各维的参考价值,从而降低算法结果对参数设定的敏感度,利用相对距离表示各点到中心值的偏离度,使其更利于不同密度子空间的孤立点检测。仿真实验结果表明,改进算法的检测精度优于传统SOD算法。
关键词:
高维数据,
子空间,
孤立点检测,
数据挖掘
CLC Number:
LIU Wen-Yuan, ZHANG Liang, SUN De-Jie, CHEN Zi-Jun. Improved SOD Outlier Detection Algorithm[J]. Computer Engineering, 2011, 37(9): 93-94,97.
刘文远, 张亮, 孙德杰, 陈子军. 改进的SOD孤立点检测算法[J]. 计算机工程, 2011, 37(9): 93-94,97.