计算机工程 ›› 2013, Vol. 39 ›› Issue (8): 60-63,68.doi: 10.3969/j.issn.1000-3428.2013.08.012

• 先进计算与数据处理 • 上一篇    下一篇

一种基于聚类和快速计算的异常数据挖掘算法

孟 静,吴锡生   

  1. (江南大学物联网工程学院,江苏 无锡 214122)
  • 收稿日期:2012-03-27 出版日期:2013-08-15 发布日期:2013-08-13
  • 作者简介:孟 静(1988-),女,硕士研究生,主研方向:人工智能,数据挖掘;吴锡生,教授、博士
  • 基金项目:
    江苏省333高层次人才工程基金资助项目(BRA2010128)

An Outlier Data Mining Algorithm Based on Clustering and Rapid Calculation

MENG Jing, WU Xi-sheng   

  1. (College of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China)
  • Received:2012-03-27 Online:2013-08-15 Published:2013-08-13

摘要: 传统局部离群因子(LOF)算法在动态增量数据库环境下,进行二次异常数据挖掘需重新计算所有数据对象局部偏离因子,存在效率较低的问题。为此,提出一种基于聚类和快速计算的异常数据挖掘算法。对传统DBSCAN算法进行改进,并且在该改进算法聚类的基础上,仅对部分数据对象计算局部偏离因子。实验结果表明,该算法在动态增量数据库环境下,与LOF与lncLOF算法相比,不仅计算时间效率高,而且能提高挖掘异常数据的精度。

关键词: 动态增量数据库, 局部离群因子算法, lncLOF算法, DBSCAN算法, 聚类

Abstract: The traditional Local Outlier Factor(LOF) algorithm need recalculate the local outlier factors of the all of data when does the second outlier mining in dynamic incremental database environment. This paper proposes an outlier mining algorithm based on clustering and rapid calculation. It improves the traditional DBSCAN algorithm, uses the improved DBSCAN algorithm to form a set of clusters, and computes the local outlier factors of the objects which are a part of dataset. Experimental results show that this algorithm performs better than LOF and lncLOF algorithm not only in the time consuming but also the accuracy of mining outliers.

Key words: dynamic incremental database, Local Outlier Factor(LOF) algorithm, lncLOF algorithm, DBSCAN algorithm, clustering

中图分类号: