摘要: 传统局部离群因子(LOF)算法在动态增量数据库环境下,进行二次异常数据挖掘需重新计算所有数据对象局部偏离因子,存在效率较低的问题。为此,提出一种基于聚类和快速计算的异常数据挖掘算法。对传统DBSCAN算法进行改进,并且在该改进算法聚类的基础上,仅对部分数据对象计算局部偏离因子。实验结果表明,该算法在动态增量数据库环境下,与LOF与lncLOF算法相比,不仅计算时间效率高,而且能提高挖掘异常数据的精度。
关键词:
动态增量数据库,
局部离群因子算法,
lncLOF算法,
DBSCAN算法,
聚类
Abstract: The traditional Local Outlier Factor(LOF) algorithm need recalculate the local outlier factors of the all of data when does the second outlier mining in dynamic incremental database environment. This paper proposes an outlier mining algorithm based on clustering and rapid calculation. It improves the traditional DBSCAN algorithm, uses the improved DBSCAN algorithm to form a set of clusters, and computes the local outlier factors of the objects which are a part of dataset. Experimental results show that this algorithm performs better than LOF and lncLOF algorithm not only in the time consuming but also the accuracy of mining outliers.
Key words:
dynamic incremental database,
Local Outlier Factor(LOF) algorithm,
lncLOF algorithm,
DBSCAN algorithm,
clustering
中图分类号:
孟静, 吴锡生. 一种基于聚类和快速计算的异常数据挖掘算法[J]. 计算机工程, 2013, 39(8): 60-63,68.
MENG Jing, TUN Ti-Sheng. An Outlier Data Mining Algorithm Based on Clustering and Rapid Calculation[J]. Computer Engineering, 2013, 39(8): 60-63,68.