基于相似孤立系数的孤立点检测算法

doi:10.3969/j.issn.1000-3428.2013.11.045

计算机工程

基于相似孤立系数的孤立点检测算法

谢岳山^a，樊晓平^a，廖志芳^b，周国恩^b，刘世杰^a

(中南大学 a. 信息科学与工程学院；b. 软件学院，长沙 410075)

收稿日期:2012-09-10 出版日期:2013-11-15 发布日期:2013-11-13
作者简介:谢岳山(1965－)，男，博士，主研方向：数据挖掘；樊晓平，教授、博士生导师；廖志芳(通讯作者)，副教授、博士；周国恩，硕士研究生；刘世杰，硕士
基金资助:
国家科技支撑计划基金资助项目(2012BAH08B01)；湖南省自然科学基金资助项目(12JJ3074)

Outlier Detection Algorithm Based on Approximate Outlier Factor

XIE Yue-shan ^a, FAN Xiao-ping ^a, LIAO Zhi-fang ^b, ZHOU Guo-en ^b, LIU Shi-jie^a

(a. School of Information Science and Engineering; b. School of Software, Central South University, Changsha 410075, China)

Received:2012-09-10 Online:2013-11-15 Published:2013-11-13

摘要/Abstract

摘要： 基于聚类的孤立点检测算法得到的结果比较粗糙，不够准确。针对该问题，提出一种基于相似孤立系数的孤立点检测算法。定义相似距离以及相似孤立点系数，给出基于相似距离的剪枝策略，根据该策略缩小可疑孤立点候选集，并降低孤立点检测算法的计算复杂度。通过选用公共数据集Iris、Labor和Segment-test进行实验验证，结果表明，该算法在发现孤立点、缩小候选集等方面相比经典孤立点检测算法更有效。

关键词: 聚类孤立点, 孤立点检测, 相似孤立系数, 剪枝策略, 孤立点候选集

Abstract: Aiming at the problem that the result of outlier detection algorithm based on clustering is coarser and not very accurate, this paper proposes an outlier detection algorithm based on Approximate Outlier Factor(AOF). This algorithm presents the definition of the similarity distance and outlier similarity coefficient, and provides a pruning strategy based on similarity distance to reduce the suspect candidate sets to decrease the computational complexity. Experiments are carried out with public datasets Iris, Labor and Segment-test, and results show that the performance of detecting outlier and reducing candidate set of this algorithm is effective compared with the classical outlier detection algorithm.

Key words: clustering outlier, outlier detection, Approximate Outlier Factor(AOF), pruning strategy, outlier candidate set

中图分类号:

TP311

谢岳山，樊晓平，廖志芳，周国恩，刘世杰. 基于相似孤立系数的孤立点检测算法[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2013.11.045.

XIE Yue-shan, FAN Xiao-ping, LIAO Zhi-fang, ZHOU Guo-en, LIU Shi-jie. Outlier Detection Algorithm Based on Approximate Outlier Factor[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2013.11.045.

http://www.ecice06.com/CN/Y2013/V39/I11/200

[1]	付嘉豪, 杨嘉怡, 李爱国. 面向安防系统的高效用语义轨迹模式挖掘[J]. 计算机工程, 2023, 49(6): 62-70.
[2]	杜明, 郝燕, 周军锋, 谭玉婷. 一种高效的周期团挖掘方法[J]. 计算机工程, 2023, 49(4): 68-76.
[3]	赵欣灿, 朱云, 毛伊敏. 基于MapReduce的高维数据频繁项集挖掘[J]. 计算机工程, 2022, 48(3): 81-89.
[4]	韦航,王永恒. 基于主题的中文微博情感分析[J]. 计算机工程, 2015, 41(9): 238-244.
[5]	李雨明,邱卫东,徐赛赛,郭英凯. 一种挖掘不确定数据最大模式的深度优先算法[J]. 计算机工程, 2015, 41(7): 204-209.
[6]	张志刚, 黄刘生, 金宗安, 项莉萍. 基于父子等价剪枝策略的最大频繁项集挖掘[J]. 计算机工程, 2013, 39(4): 219-221,225.
[7]	刘文远, 张亮, 孙德杰, 陈子军. 改进的SOD孤立点检测算法[J]. 计算机工程, 2011, 37(9): 93-94,97.
[8]	朱江;戚正伟. 基于信号衰减和孤立点检测的移动定位算法[J]. 计算机工程, 2010, 36(3): 280-282.
[9]	赵峰;秦锋. 基于单元的孤立点检测算法改进及应用[J]. 计算机工程, 2009, 35(19): 78-80.
[10]	景波;刘莹;黄兵. 基于孤立点检测的工作流研究[J]. 计算机工程, 2008, 34(22): 268-270.
[11]	李云;袁运浩;陈崚. 基于灰色关联分析的孤立点挖掘算法[J]. 计算机工程, 2008, 34(19): 44-46.
[12]	单世民;邓贵仕;何英昊. 数据流中孤立点识别方法[J]. 计算机工程, 2007, 33(15): 172-174.

选择文件类型/文献管理软件名称

选择包含的内容

基于相似孤立系数的孤立点检测算法

Outlier Detection Algorithm Based on Approximate Outlier Factor

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于相似孤立系数的孤立点检测算法

Outlier Detection Algorithm Based on Approximate Outlier Factor

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

编辑推荐

Metrics

本文评价