基于信息熵降维的混合属性数据流聚类算法

doi:10.3969/j.issn.1000-3428.2011.19.026

计算机工程 ›› 2011, Vol. 37 ›› Issue (19): 82-84,87. doi: 10.3969/j.issn.1000-3428.2011.19.026

基于信息熵降维的混合属性数据流聚类算法

谭建建，郑洪源，丁秋林

(南京航空航天大学信息科学与技术学院，南京 210016)

收稿日期:2011-03-01 出版日期:2011-10-05 发布日期:2011-10-05
作者简介:谭建建(1985－)，男，硕士研究生，主研方向：数据挖掘，信息安全；郑洪源，副教授、博士；丁秋林，教授、博士生导师

Clustering Algorithm for Data Stream with Heterogeneous Attributes Based on Information Entropy Dimension Reduction

TAN Jian-jian, ZHENG Hong-yuan, DING Qiu-lin

(College of Information Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)

Received:2011-03-01 Online:2011-10-05 Published:2011-10-05

摘要/Abstract

摘要： 现有的数据流聚类算法无法处理高维混合属性的数据流。针对该问题，对HPStream算法的脱机聚类和联机聚类过程进行改进，利用频度矩阵处理名词属性，通过基于信息熵的名词属性选择方法降低数据维度。实验结果表明，该算法能有效处理混合属性和维度较高的数据集，与HPStream算法相比，聚类精度有5%~15%的提高。

关键词: 数据流挖掘, 混合属性, 频度矩阵, 信息熵, 降维

Abstract: Existed data stream clustering algorithms can not deal with the data stream with high-dimensional heterogeneous attributes. To address the problem, this paper improves the off-line process and the on-line process of HPStream algorithm, which uses frequency matrix to handle the categorical attributes and uses the principle of information entropy to handle the problem of high dimension. Experimental results show that the algorithm can manipulate heterogeneous attributes and high-dimensional data sets. Compared with the HPStream algorithm, its clustering precision is increased by 5% ~15%.

Key words: data stream mining, heterogeneous attributes, frequency matrix, information entropy, dimension reduction

中图分类号:

TP311

谭建建, 郑洪源, 丁秋林. 基于信息熵降维的混合属性数据流聚类算法[J]. 计算机工程, 2011, 37(19): 82-84,87.

TAN Jian-Jian, ZHENG Hong-Yuan, DING Qiu-Lin. Clustering Algorithm for Data Stream with Heterogeneous Attributes Based on Information Entropy Dimension Reduction[J]. Computer Engineering, 2011, 37(19): 82-84,87.

http://www.ecice06.com/CN/Y2011/V37/I19/82

[1]	陈君航, 杨祖元, 刘名扬, 李陵江. 基于正交约束的广义可分离非负矩阵分解算法[J]. 计算机工程, 2023, 49(8): 46-53.
[2]	霍跃华, 赵法起. 基于Stacking与多特征融合的加密恶意流量检测[J]. 计算机工程, 2023, 49(5): 165-172,180.
[3]	陈何雄, 罗宇薇, 韦云凯, 郭威, 杭菲璐, 何映军, 杨宁. 基于联邦学习的SDN异常流量协同检测技术[J]. 计算机工程, 2023, 49(3): 168-176.
[4]	郑秋梅, 徐林康, 王风华, 林超. 基于改进自注意力机制的金字塔场景解析网络[J]. 计算机工程, 2023, 49(1): 242-249.
[5]	孙福禄, 王宇嘉, 刘子怡. 基于节点引力与鱼记忆的社区检测算法[J]. 计算机工程, 2022, 48(5): 104-111.
[6]	李晋国, 焦旭斌. 雾计算环境下入侵检测模型研究[J]. 计算机工程, 2022, 48(5): 43-52.
[7]	张恒, 陈晓红, 蓝宇翔, 李舜酩. 基于深度学习的监督型典型相关分析[J]. 计算机工程, 2022, 48(5): 222-228.
[8]	陶洋, 鲍灵浪, 胡昊. 结合表示学习与嵌入子空间学习的降维方法[J]. 计算机工程, 2021, 47(6): 83-87,97.
[9]	刘彦雯, 张金鑫, 张宏杰, 经玲. 基于双重局部保持的不完整多视角嵌入学习方法[J]. 计算机工程, 2021, 47(6): 115-122,141.
[10]	朱映波, 赵阳洋, 王佩, 尹凯, 王振宇. 融合马尔科夫决策过程与信息熵的对话策略[J]. 计算机工程, 2021, 47(3): 284-290.
[11]	周培春, 吴兰岸. 多尺度多核高斯过程隐变量模型[J]. 计算机工程, 2021, 47(2): 285-292.
[12]	王旭, 陈永乐, 王庆生, 陈俊杰. 结合特征选择与集成学习的密码体制识别方案[J]. 计算机工程, 2021, 47(1): 139-145,153.
[13]	罗彬珅, 刘利民, 董健, 刘璟麒. 基于SAE-GA-SVM模型的雷达新型干扰识别[J]. 计算机工程, 2020, 46(6): 281-287.
[14]	何发镁, 马慧珍, 王旭仁, 冯安然. 基于特征分组聚类的异常入侵检测系统研究[J]. 计算机工程, 2020, 46(4): 123-128,134.
[15]	张恩豪, 陈晓红, 刘鸿, 朱玉莲. 基于典型相关分析的多视图降维算法综述[J]. 计算机工程, 2020, 46(2): 1-10.

选择文件类型/文献管理软件名称

选择包含的内容

基于信息熵降维的混合属性数据流聚类算法

Clustering Algorithm for Data Stream with Heterogeneous Attributes Based on Information Entropy Dimension Reduction

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于信息熵降维的混合属性数据流聚类算法

Clustering Algorithm for Data Stream with Heterogeneous Attributes Based on Information Entropy Dimension Reduction

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价