基于密度的分布式聚类算法研究

doi:10.3969/j.issn.1000-3428.2008.17.024

计算机工程 ›› 2008, Vol. 34 ›› Issue (17): 65-67,7. doi: 10.3969/j.issn.1000-3428.2008.17.024

基于密度的分布式聚类算法研究

郑金彬1，卓义宝2

(1. 龙岩学院数学与计算机科学学院，龙岩 364000；2. 厦门大学计算机科学系，厦门 361005)

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-09-05 发布日期:2008-09-05

Research of Distributed Clustering Algorithm Based on Density

ZHENG Jin-bin1, ZHUO Yi-bao2

(1. College of Mathematics and Computer Science, Longyan University, Longyan 364000; 2. Department of Computer Science, Xiamen University, Xiamen 361005)

Received:1900-01-01 Revised:1900-01-01 Online:2008-09-05 Published:2008-09-05

摘要/Abstract

摘要： 大量复杂异构数据分布于各个网络站点上，分布式聚类是海量数据处理的一个重要应用。该文针对基于密度的分布式聚类(DBDC)算法提出一种改进算法，利用局部聚类获取更佳的代表对象，将代表对象集附带相关信息传送至主站点，用增强的基于密度的聚类算法进行全局聚类，并更新子站点聚类。理论分析和实验结果表明，该算法在聚类质量和算法效率方面优于DBDC算法。

关键词: 数据挖掘, 分布式聚类, 特殊核心对象

Abstract: Large amounts of heterogeneous complex data reside on different computers connected to each other by networks. Distributed clustering is an important implementation of large data process. Based on the Density Based Distribute Clustering(DBDC) algorithm, this paper proposes an improved algorithm. It gets better representatives bojects by local clustering, and sends these representatives with some other correlative information to the main computer. They are clustered with an enhanced clustering algorithm based on density. The clustering of sub computer is updated. Theoretical analysis and experimental results testify that this algorithm outperforms DBDC in both clustering quality and efficiency.

Key words: data mining, distributed clustering, specific core objects

中图分类号:

TP301.6

郑金彬;卓义宝. 基于密度的分布式聚类算法研究[J]. 计算机工程, 2008, 34(17): 65-67,7.

ZHENG Jin-bin; ZHUO Yi-bao. Research of Distributed Clustering Algorithm Based on Density[J]. Computer Engineering, 2008, 34(17): 65-67,7.

http://www.ecice06.com/CN/Y2008/V34/I17/65

[1]	席荣康, 蔡满春, 芦天亮. 基于数据增强与流数据处理的Tor流量分析模型[J]. 计算机工程, 2023, 49(3): 177-184.
[2]	谷青竹, 董红斌. PPDM中面向k-匿名的MI Loss评估模型[J]. 计算机工程, 2022, 48(4): 143-147.
[3]	王璐, 刘晓清, 何震瀛. 连续时间区间内的频繁词序列挖掘算法[J]. 计算机工程, 2022, 48(2): 79-85,91.
[4]	张攀, 高丰, 周逸, 饶涵宇, 毛冬, 李静. 一种在线实时微服务调用链异常检测方法[J]. 计算机工程, 2022, 48(11): 161-169.
[5]	吴军, 欧阳艾嘉, 张琳. 面向置换检验的冗余对比模式过滤算法[J]. 计算机工程, 2022, 48(1): 75-84.
[6]	吴军, 欧阳艾嘉, 张琳. 面向对比序列模式发现的独立精确置换检验算法[J]. 计算机工程, 2021, 47(8): 45-53,61.
[7]	杜诗晴, 王鹏, 汪卫. 一种基于MDL的日志序列模式挖掘算法[J]. 计算机工程, 2021, 47(2): 118-125.
[8]	魏文浩, 唐泽坤, 刘刚. 基于距离和密度的PBK-means算法[J]. 计算机工程, 2020, 46(9): 68-75.
[9]	史明阳, 王鹏, 汪卫. 有监督时间序列分割与状态识别算法[J]. 计算机工程, 2020, 46(5): 131-138.
[10]	张潘, 卢光跃, 吕少卿, 赵雪莉. 基于矩阵分解的属性网络表示学习[J]. 计算机工程, 2020, 46(10): 67-73.
[11]	王慧健, 刘峥, 李云, 李涛. 基于神经网络语言模型的时间序列趋势预测方法[J]. 计算机工程, 2019, 45(7): 13-19,25.
[12]	张玺君, 袁占亭, 张红, 高玮军, 张恩展. 交通轨迹大数据预处理方法研究[J]. 计算机工程, 2019, 45(6): 26-31.
[13]	李克,王海,徐小龙,杜煜. 基于众包感知的移动网络小区信息侦测方法[J]. 计算机工程, 2019, 45(2): 92-100.
[14]	崔晨,邓赵红,王士同. 基于Lasso稀疏学习的径向基函数神经网络模型[J]. 计算机工程, 2019, 45(2): 173-177.
[15]	谢彬,张琨,蔡颖,蒋彤彤,麻孟越. 移动目标关联共现规则挖掘算法研究[J]. 计算机工程, 2018, 44(8): 61-67,73.

选择文件类型/文献管理软件名称

选择包含的内容

基于密度的分布式聚类算法研究

Research of Distributed Clustering Algorithm Based on Density

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于密度的分布式聚类算法研究

Research of Distributed Clustering Algorithm Based on Density

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价