计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于相似孤立系数的孤立点检测算法

谢岳山a,樊晓平a,廖志芳b,周国恩b,刘世杰a   

  1. (中南大学 a. 信息科学与工程学院;b. 软件学院,长沙 410075)
  • 收稿日期:2012-09-10 出版日期:2013-11-15 发布日期:2013-11-13
  • 作者简介:谢岳山(1965-),男,博士,主研方向:数据挖掘;樊晓平,教授、博士生导师;廖志芳(通讯作者),副教授、博士; 周国恩,硕士研究生;刘世杰,硕士
  • 基金项目:
    国家科技支撑计划基金资助项目(2012BAH08B01);湖南省自然科学基金资助项目(12JJ3074)

Outlier Detection Algorithm Based on Approximate Outlier Factor

XIE Yue-shan a, FAN Xiao-ping a, LIAO Zhi-fang b, ZHOU Guo-en b, LIU Shi-jie a   

  1. (a. School of Information Science and Engineering; b. School of Software, Central South University, Changsha 410075, China)
  • Received:2012-09-10 Online:2013-11-15 Published:2013-11-13

摘要: 基于聚类的孤立点检测算法得到的结果比较粗糙,不够准确。针对该问题,提出一种基于相似孤立系数的孤立点检测算法。定义相似距离以及相似孤立点系数,给出基于相似距离的剪枝策略,根据该策略缩小可疑孤立点候选集,并降低孤立点检测算法的计算复杂度。通过选用公共数据集Iris、Labor和Segment-test进行实验验证,结果表明,该算法在发现孤立点、缩小候选集等方面相比经典孤立点检测算法更有效。

关键词: 聚类孤立点, 孤立点检测, 相似孤立系数, 剪枝策略, 孤立点候选集

Abstract: Aiming at the problem that the result of outlier detection algorithm based on clustering is coarser and not very accurate, this paper proposes an outlier detection algorithm based on Approximate Outlier Factor(AOF). This algorithm presents the definition of the similarity distance and outlier similarity coefficient, and provides a pruning strategy based on similarity distance to reduce the suspect candidate sets to decrease the computational complexity. Experiments are carried out with public datasets Iris, Labor and Segment-test, and results show that the performance of detecting outlier and reducing candidate set of this algorithm is effective compared with the classical outlier detection algorithm.

Key words: clustering outlier, outlier detection, Approximate Outlier Factor(AOF), pruning strategy, outlier candidate set

中图分类号: