数据挖掘中一种容错的子空间聚类算法

doi:10.3969/j.issn.1000-3428.2016.02.038

计算机工程

数据挖掘中一种容错的子空间聚类算法

田进华,孙利

(黄淮学院信息工程学院,河南驻马店 463000)

收稿日期:2014-12-05 出版日期:2016-02-15 发布日期:2016-01-29
作者简介:田进华(1981-),男,实验师、硕士,主研方向为大数据处理、多核系统;孙利,副教授、硕士。
基金资助:
河南省科技攻关计划基金资助项目(122102210430);河南省教育厅科学技术研究基金资助重点项目(14B520036)。

A Fault-tolerance Subspace Clustering Algorithm in Data Mining

TIAN Jinhua,SUN Li

(School of Information Engineering,Huanghuai University,Zhumadian,Henan 463000,China)

Received:2014-12-05 Online:2016-02-15 Published:2016-01-29

摘要/Abstract

摘要： 为提高现有子空间聚类算法的计算效率,根据对象、维度、模式容限以及相对性阈值约束缺失值数量,给出通用的容错子空间聚类定义,并对其单调性进行证明,提出一种面向受限属性中缺失值处理的容错子空间聚类算法。通过对子空间网格进行深度优先搜索删除低维冗余聚类,避免遍历子空间以提高聚类效率。基于真实数据和合成数据的实验结果表明,与CLIQUE,SCHISM聚类算法相比,该算法平均运行速度提升了60%~90%,即使面对缺失值情况,也可快速获得子空间聚类结果,具有较高的聚类质量。

关键词: 数据分析, 多属性, 缺失值, 聚类, 单调性, 容错

Abstract: In order to improve the computation efficiency of the subspace clustering algorithm,this paper gives a general fault-tolerance subspace clustering definition according to the number of objects,dimensions,mode tolerance and relative threshold constraint,proves its monotonicity,and proposes a fault-tolerance subspace clustering algorithm for dealing with missing value in constrained attributes.It searches subspace grid by using depth-first strategy to delete low dimensional redundancy clustering,and avoids traversing the subspace effectively and improves the efficiency of clustering.Experimental results on real data and synthetic data show that the average speed of this algorithm improves 60% ~90% compared with CLIQUE,SCHISM clustering algorithm,and it can quickly determine the subspace clustering results even in the face of missing values,so it has higher clustering quality.

Key words: data analysis, multi-attribute, missing value, clustering, monotonicity, fault-tolerance

中图分类号:

TP391

田进华,孙利. 数据挖掘中一种容错的子空间聚类算法[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2016.02.038.

TIAN Jinhua,SUN Li. A Fault-tolerance Subspace Clustering Algorithm in Data Mining[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2016.02.038.

http://www.ecice06.com/CN/Y2016/V42/I2/210

参考文献

参考文献［1］Müller E,Günnemann S,Assent I,et al.Evaluating Clustering in Subspace Projections of High Dimensional Data［J］.Proceedings of the VLDB Endowment,2009,2(1):1270-1281. ［2］徐宇明,陈诚,熊赟,等.APT-KNN:一种面向分类问题的高效缺失值填充算法［J］.计算机应用与软件,2011,28(4):135-139. ［3］武森,冯小东,单志广.基于不完备数据聚类的缺失数据填补方法［J］.计算机学报,2012,35(8):1726-1738. ［4］潘立强,李建中,骆吉洲.传感器网络中一种基于时-空相关性的缺失值估计算法［J］.计算机学报,2010,33(1):1-11. ［5］张婵.一种基于支持向量机的缺失值填补算法［J］.计算机应用与软件,2013,30(5):226-228. ［6］Assent I,Krieger R,Muller E,et al.INSCY:Indexing Sub-space Clusters with In-process-removal of Redundancy［C］//Proceedings of the 8th IEEE International Conference on Data Mining.Washington D.C.,USA:IEEE Press,2008:719-724. ［7］Agrawal R,Gehrke J,Gunopulos D,et al.Automatic Subspace Clustering of High Dimensional Data［J］.Data Mining and Knowledge Discovery,2005,11(1):5-33. ［8］朱林,雷景生,毕忠勤,等.一种基于数据流的软子空间聚类算法［J］.软件学报,2013,24(11):2610-2627. ［9］陈黎飞,郭躬德,姜青山.自适应的软子空间聚类算法［J］.软件学报,2010,21(10):2513-2523. ［10］彭柳青,张军英.一种鲁棒的子空间聚类算法［J］.西安交通大学学报,2011,45(6):13-19. ［11］Frank A,Asuncion A.UCI Machine Learning Reposi-tory［EB/OL］.(2013-11-15).http://archive.ics.uci.edu/ml. ［12］Moise G,Sander J,Ester M.P3C:A Robust Projected Clustering Algorithm［C］//Proceedings of the 6th Inter-national Conference on Data Mining.Washington D.C.,USA:IEEE Press,2006:414-425. ［13］Sequeira K,Zaki M.SCHISM:A New Approach for Interesting Subspace Mining［C］//Proceedings of the 4th IEEE International Conference on Data Mining.Washington D.C.,USA:IEEE Press,2004:186-193. 编辑陆燕菲

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	郑美光, 杨泳. 基于互信息软聚类的个性化联邦学习算法[J]. 计算机工程, 2023, 49(8): 20-28.
[3]	王春东, 王翔宇. 多层次实用拜占庭容错算法改进[J]. 计算机工程, 2023, 49(8): 29-36.
[4]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[5]	邱天晨, 郑小盈, 祝永新, 封松林. 面向非独立同分布数据的联邦学习架构[J]. 计算机工程, 2023, 49(7): 110-117.
[6]	高小方, 原玉梁, 温静, 白雪飞. 面向相交多流形聚类的标签传播算法[J]. 计算机工程, 2023, 49(6): 90-98.
[7]	位雅, 张正军, 何凯琳, 唐莉. 基于相对密度的密度峰值聚类算法[J]. 计算机工程, 2023, 49(6): 53-61.
[8]	戴浩磊, 黄永慧, 周郭许. 基于超图正则化非负张量链分解的聚类分析[J]. 计算机工程, 2023, 49(6): 81-89.
[9]	王磊, 王楠. 新Schweizer Sklar范数图模糊算子与决策应用[J]. 计算机工程, 2023, 49(4): 92-100.
[10]	李晓腾, 张盼盼, 勾智楠, 高凯. 基于多任务学习的多模态命名实体识别方法[J]. 计算机工程, 2023, 49(4): 114-119.
[11]	程小辉, 李钰, 康燕萍. 基于中间图特征提取的卷积网络双标准剪枝[J]. 计算机工程, 2023, 49(3): 105-112.
[12]	胡慧旗, 张维强, 徐晨. 判别性增强的稀疏子空间聚类[J]. 计算机工程, 2023, 49(2): 98-104.
[13]	刘泽坤, 王峰, 贾海蓉. 结合动态信用机制的PBFT算法优化方案[J]. 计算机工程, 2023, 49(2): 191-198.
[14]	袁立宁, 胡皓, 刘钊. 基于多通道图卷积自编码器的图表示学习[J]. 计算机工程, 2023, 49(2): 150-160,174.
[15]	蔡瑞初, 伍运金, 陈薇, 郝志峰. 面向多元时间序列的群体因果关系发现算法[J]. 计算机工程, 2023, 49(2): 127-135.

选择文件类型/文献管理软件名称

选择包含的内容

数据挖掘中一种容错的子空间聚类算法

A Fault-tolerance Subspace Clustering Algorithm in Data Mining

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

数据挖掘中一种容错的子空间聚类算法

A Fault-tolerance Subspace Clustering Algorithm in Data Mining

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价