Abstract:
In order to improve the computation efficiency of the subspace clustering algorithm,this paper gives a general fault-tolerance subspace clustering definition according to the number of objects,dimensions,mode tolerance and relative threshold constraint,proves its monotonicity,and proposes a fault-tolerance subspace clustering algorithm for dealing with missing value in constrained attributes.It searches subspace grid by using depth-first strategy to delete low dimensional redundancy clustering,and avoids traversing the subspace effectively and improves the efficiency of clustering.Experimental results on real data and synthetic data show that the average speed of this algorithm improves 60% ~90% compared with CLIQUE,SCHISM clustering algorithm,and it can quickly determine the subspace clustering results even in the face of missing values,so it has higher clustering quality.
Key words:
data analysis,
multi-attribute,
missing value,
clustering,
monotonicity,
fault-tolerance
摘要: 为提高现有子空间聚类算法的计算效率,根据对象、维度、模式容限以及相对性阈值约束缺失值数量,给出通用的容错子空间聚类定义,并对其单调性进行证明,提出一种面向受限属性中缺失值处理的容错子空间聚类算法。通过对子空间网格进行深度优先搜索删除低维冗余聚类,避免遍历子空间以提高聚类效率。基于真实数据和合成数据的实验结果表明,与CLIQUE,SCHISM聚类算法相比,该算法平均运行速度提升了60%~90%,即使面对缺失值情况,也可快速获得子空间聚类结果,具有较高的聚类质量。
关键词:
数据分析,
多属性,
缺失值,
聚类,
单调性,
容错
CLC Number:
TIAN Jinhua,SUN Li. A Fault-tolerance Subspace Clustering Algorithm in Data Mining[J]. Computer Engineering.
田进华,孙利. 数据挖掘中一种容错的子空间聚类算法[J]. 计算机工程.