作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2013, Vol. 39 ›› Issue (3): 46-50,55. doi: 10.3969/j.issn.1000-3428.2013.03.010

• 软件技术与数据库 • 上一篇    下一篇

一种基于概率的孤立点检测方法

张 悦,刘 杰,李 航   

  1. (沈阳师范大学软件学院,沈阳 110034)
  • 收稿日期:2012-04-18 出版日期:2013-03-15 发布日期:2013-03-13
  • 作者简介:张 悦(1980-),女,讲师、硕士,主研方向:数据挖掘;刘 杰,教授、硕士;李 航,教授、博士
  • 基金资助:
    国家自然科学基金资助项目(60970112)

An Outlier Detection Method Based on Probability

ZHANG Yue, LIU Jie, LI Hang   

  1. (Software College, Shenyang Normal University, Shenyang 110034, China)
  • Received:2012-04-18 Online:2013-03-15 Published:2013-03-13

摘要: 现有孤立点检测方法大多数都需要预先设定孤立点个数,若设定不准确将降低孤立点检测的准确性。针对该问题,提出一种基于概率的孤立点检测方法。结合基于密度的DBSCAN算法与中位数求方差的方法,对待检测数据集进行聚类,提取出不包含在任何聚类中的可疑孤立点并进行分析,从而确定最终孤立点。该方法所检测的数据与时间因素线性无关,不必预先设定孤立点个数及聚类数,并且对噪声数据具有较强的抗干扰能力。IRIS测试数据集上的实验结果表明,该方法能够有效地识别孤立点。

关键词: 孤立点, 概率, 中位数, DBSCAN算法, 方差, 聚类

Abstract: Existing outlier detection algorithms most require a predetermined number of outlier. If it is not accurate, it can greatly reduce the accuracy of outlier detection algorithm. Aiming at above problem, a detection method of outlier based on probability is proposed. The detection method combines the DBSCAN algorithm with variance from median algorithm to cluster detection data set, and extracts suspicious outliers which are not belonging to any cluster. These suspicious outliers are detected by the definition of outlier, and outliers are determined. The method are insensitivity with noisy data. The data disposed by this method is irrelative to the time scales. And it does not need to set the number of outlier and cluster. Experimental results on IRIS show that this algorithm can detect outliers effectively.

Key words: outlier, probability, median, DBSCAN algorithm, variance, clustering

中图分类号: