A Fault-tolerance Subspace Clustering Algorithm in Data Mining

doi:10.3969/j.issn.1000-3428.2016.02.038

Abstract

Abstract: In order to improve the computation efficiency of the subspace clustering algorithm,this paper gives a general fault-tolerance subspace clustering definition according to the number of objects,dimensions,mode tolerance and relative threshold constraint,proves its monotonicity,and proposes a fault-tolerance subspace clustering algorithm for dealing with missing value in constrained attributes.It searches subspace grid by using depth-first strategy to delete low dimensional redundancy clustering,and avoids traversing the subspace effectively and improves the efficiency of clustering.Experimental results on real data and synthetic data show that the average speed of this algorithm improves 60% ~90% compared with CLIQUE,SCHISM clustering algorithm,and it can quickly determine the subspace clustering results even in the face of missing values,so it has higher clustering quality.

Key words: data analysis, multi-attribute, missing value, clustering, monotonicity, fault-tolerance

摘要： 为提高现有子空间聚类算法的计算效率,根据对象、维度、模式容限以及相对性阈值约束缺失值数量,给出通用的容错子空间聚类定义,并对其单调性进行证明,提出一种面向受限属性中缺失值处理的容错子空间聚类算法。通过对子空间网格进行深度优先搜索删除低维冗余聚类,避免遍历子空间以提高聚类效率。基于真实数据和合成数据的实验结果表明,与CLIQUE,SCHISM聚类算法相比,该算法平均运行速度提升了60%~90%,即使面对缺失值情况,也可快速获得子空间聚类结果,具有较高的聚类质量。

关键词: 数据分析, 多属性, 缺失值, 聚类, 单调性, 容错

CLC Number:

TP391

TIAN Jinhua,SUN Li. A Fault-tolerance Subspace Clustering Algorithm in Data Mining[J]. Computer Engineering.

田进华,孙利. 数据挖掘中一种容错的子空间聚类算法[J]. 计算机工程.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2016/V42/I2/210

References

参考文献［1］Müller E,Günnemann S,Assent I,et al.Evaluating Clustering in Subspace Projections of High Dimensional Data［J］.Proceedings of the VLDB Endowment,2009,2(1):1270-1281. ［2］徐宇明,陈诚,熊赟,等.APT-KNN:一种面向分类问题的高效缺失值填充算法［J］.计算机应用与软件,2011,28(4):135-139. ［3］武森,冯小东,单志广.基于不完备数据聚类的缺失数据填补方法［J］.计算机学报,2012,35(8):1726-1738. ［4］潘立强,李建中,骆吉洲.传感器网络中一种基于时-空相关性的缺失值估计算法［J］.计算机学报,2010,33(1):1-11. ［5］张婵.一种基于支持向量机的缺失值填补算法［J］.计算机应用与软件,2013,30(5):226-228. ［6］Assent I,Krieger R,Muller E,et al.INSCY:Indexing Sub-space Clusters with In-process-removal of Redundancy［C］//Proceedings of the 8th IEEE International Conference on Data Mining.Washington D.C.,USA:IEEE Press,2008:719-724. ［7］Agrawal R,Gehrke J,Gunopulos D,et al.Automatic Subspace Clustering of High Dimensional Data［J］.Data Mining and Knowledge Discovery,2005,11(1):5-33. ［8］朱林,雷景生,毕忠勤,等.一种基于数据流的软子空间聚类算法［J］.软件学报,2013,24(11):2610-2627. ［9］陈黎飞,郭躬德,姜青山.自适应的软子空间聚类算法［J］.软件学报,2010,21(10):2513-2523. ［10］彭柳青,张军英.一种鲁棒的子空间聚类算法［J］.西安交通大学学报,2011,45(6):13-19. ［11］Frank A,Asuncion A.UCI Machine Learning Reposi-tory［EB/OL］.(2013-11-15).http://archive.ics.uci.edu/ml. ［12］Moise G,Sander J,Ester M.P3C:A Robust Projected Clustering Algorithm［C］//Proceedings of the 6th Inter-national Conference on Data Mining.Washington D.C.,USA:IEEE Press,2006:414-425. ［13］Sequeira K,Zaki M.SCHISM:A New Approach for Interesting Subspace Mining［C］//Proceedings of the 4th IEEE International Conference on Data Mining.Washington D.C.,USA:IEEE Press,2004:186-193. 编辑陆燕菲

[1]	GUO Jipeng, XU Shilong, LONG Jiahao, WANG Youqing, SUN Yanfeng, YIN Baocai. Multi-view Subspace Clustering Based on Dual Cross-view Correlation Detection [J]. Computer Engineering, 2025, 51(4): 27-36.
[2]	LI Qiwen, WANG Zhihe, DU Hui, LU Depeng. Adaptive Density Peak Clustering Algorithm Based on Gaussian Distribution [J]. Computer Engineering, 2025, 51(4): 137-148.
[3]	NIE Lei, HU Zisheng, BAO Haizhou. Heterogeneous Vehicular Network Selection Method Based on RSU-assisted and Adaptive Clustering [J]. Computer Engineering, 2025, 51(3): 162-171.
[4]	Yedong MAO, Chunhui ZHANG, Jie CHEN. Evolvable Transformer Fault Diagnosis Model Combining Feature Analysis and Machine Learning [J]. Computer Engineering, 2024, 50(8): 379-388.
[5]	Hongjiao LI, Baojin WANG, Zhaohui WANG, Renhao HU. Dual-Client Selection Algorithm Based on Model Similarity and Local Loss [J]. Computer Engineering, 2024, 50(8): 153-164.
[6]	HU Aoran, CHEN Xiaohong. One-step Multi-view Clustering Based on Diversity and Consistency [J]. Computer Engineering, 2024, 50(5): 51-61.
[7]	Yue MA, Mi WEN. Spatial Load Forecasting Method Based on Multiscale LDTW and TCN [J]. Computer Engineering, 2024, 50(3): 106-113.
[8]	Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI. Federated Learning Optimization Method in Non-IID Scenarios [J]. Computer Engineering, 2024, 50(3): 166-172.
[9]	Lijuan WANG, Jinping XING, Ming YIN, Zhifeng HAO, Ruichu CAI, Wen WEN. Weight Adaptive Multi-view Spectral Clustering Algorithm Based on Consistent Graphs [J]. Computer Engineering, 2024, 50(2): 122-131.
[10]	PAN Wei, HUANG Ruizhang, REN Lina, XUE Jingjing. Deep Document Clustering Based on Adaptive Structural Learning [J]. Computer Engineering, 2024, 50(11): 89-97.
[11]	ZHANG Yujie, GAO Han. Image Segmentation Algorithm for Stamping Defects Based on Improved FCM [J]. Computer Engineering, 2024, 50(10): 342-351.
[12]	LIU Daxing, GU Naijie, HUANG Zhangjin, SU Junjie, QI Dongsheng. A Sampling Algorithm for Software Prefetching Using Memory Access Traces [J]. Computer Engineering, 2024, 50(10): 362-369.
[13]	ZHANG Junna, HAN Chaochen, CHEN Jiawei, ZHAO Xiaoyan, YUAN Peiyan. A Method for Joint Edge Server Deployment and Service Placement [J]. Computer Engineering, 2024, 50(10): 266-280.
[14]	Sihui LIU, Quanxue GAO, Wei SONG, Deyan XIE. Multiview Spectral Clustering Based on Weighted Tensor Low-Rank Constraint [J]. Computer Engineering, 2024, 50(1): 129-137.
[15]	Meiguang ZHENG, Yong YANG. Personalized Federated Learning Algorithm Based on Mutual Information and Soft Clustering [J]. Computer Engineering, 2023, 49(8): 20-28.

Please choose a citation manager

Content to export