MapReduce 模型下的模糊C 均值算法研究

doi:10.3969/j.issn.1000-3428.2014.10.010

计算机工程

MapReduce 模型下的模糊C 均值算法研究

王永贵,李鸿绪,宋　晓

(辽宁工程技术大学软件学院,辽宁葫芦岛125105)

收稿日期:2013-09-29 出版日期:2014-10-15 发布日期:2014-10-13
作者简介:王永贵(1967 - ),男,教授、硕士,主研方向:云计算,绿色计算,数据挖掘;李鸿绪、宋　晓,硕士研究生。
基金资助:
国家自然科学基金资助项目(60903082);辽宁省教育厅基金资助项目(L2012113)。

Research on Fuzzy C-means Algorithm on MapReduce Model

WANG Yong-gui,LI Hong-xu,SONG Xiao

(College of Software,Liaoning Technical University,Huludao 125105,China)

Received:2013-09-29 Online:2014-10-15 Published:2014-10-13

摘要/Abstract

摘要： 针对模糊C 均值算法需要不断迭代来计算样本数据的隶属度值以及聚类中心的特点,利用MapReduce 模型解决海量数据下的模糊C 均值问题,进而提出高效的模糊C 均值算法。在Map 阶段和Reduce 阶段分别完成隶属度和聚类中心的计算,每次迭代都需要启动一次完整的MapReduce 执行过程。通过多次迭代计算出隶属度值以及聚类中心,并更新聚类中心文件,供下一轮作业使用,重复执行这一过程直至得到最终聚类结果。实验结果表明,该算法能够有效减少MapReduce 计算过程中的迭代次数,从而提高整体执行效率。

关键词: 模糊C 均值算法, MapReduce 模型, 海量数据, 高效, 迭代

Abstract: Fuzzy C-means(FCM) algorithm requires constant iteration to calculate the characteristics of the membership value of the sample data and cluster center,using MapReduce model to solve the FCM under massive data. Map stage calculates membership degree,and Reduce stage completes computing cluster center. Each iteration needs to start a MapReduce implementation process. Through multiple iterations,it calculates the value of membership and cluster center, and updates cluster center file for the use of next round job. Repeat this process until get the final clustering results. Experimental results show that the algorithm can effectively reduce the number of iterations during the calculation and improve the overall efficiency of the implementation.

Key words: Fuzzy C-means(FCM) algorithm, MapReduce model, mass data, high efficiency, iteration

中图分类号:

TP391． 41

王永贵,李鸿绪,宋晓. MapReduce 模型下的模糊C 均值算法研究[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2014.10.010.

WANG Yong-gui,LI Hong-xu,SONG Xiao. Research on Fuzzy C-means Algorithm on MapReduce Model[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2014.10.010.

http://www.ecice06.com/CN/Y2014/V40/I10/47

参考文献

参考文献 [ 1 ]　虞倩倩,戴月明．基于MapReduce 的并行模糊C 均值算法[J]. 计算机工程与应用,2013,49 (14):133- 137,151. [ 2 ]　胡　磊,牛秦洲,陈　艳．模糊C 均值与支持向量机相结合的增强聚类算法[J]. 计算机应用,2013,33 (4):991-993. [ 3 ]　Highland F, Stephenson J． Fitting the Problem to the Paradigm: Algorithm Characteristics Required for Effective Use of MapReduce [ J ]. Procedia Computer Science,2012,12:212-217. [ 4 ]　Polo J, Carrera D． Performance-driven Task Coscheduling for MapReduce Environments [ C ] / / Proc. of IEEE Network Operations and Management Symposium. [S． l． ]:IEEE Press,2010:373-380. [ 5 ]　Marozzo F,Talia D,Trunfio P． P2P-MapReduce:Parallel Data Processing in Dynamic Cloud Environments [J]. Journal of Computer and System Sciences,2011,78(5): 1382-1402. [ 6 ]　李建江,崔　健,王　聃,等． MapReduce 并行编程模型研究综述[J]. 电子学报,2011,39(11):2635-2642. [ 7 ]　林　彬,李姗姗, 廖湘科, 等． Seadown: 一种异构 MapReduce 集群中面向SLA 的能耗管理方法[J]. 计算机学报,2013,36(5):977-987. [ 8 ]　赵彦荣,王伟平,孟　丹,等．基于Hadoop 的高效连接查询处理算法CHMJ[J]. 软件学报,2012,23(8): 2032-2041. [ 9 ]　Shafer J,Rixner S, Cox A L． The Hadoop Distributed Filesystem:Balancing Portability and Performance[C] / / Proc. of 2010 IEEE International Symposium on Performance Analysis of Systems &Software. Washington D. C. , USA:IEEE Computer Society,2010:122-133. [10]　廖　彬,于炯,张　陶,等．基于分布式文件系统 HDFS 的节能算法[J]. 计算机学报,2013,36 (5): 1047-1064. [11]　王　骏,王士同．基于混合距离学习的双指数模糊C 均值算法[J]. 软件学报,2010,21(8):1878-1888. [12]　肖立中,邵志清,马汉华,等．网络入侵检测中的自动决定聚类数算法[ J ]. 软件学报, 2008, 19 (8): 2140-2148. [13]　武小红,周建江．可能性模糊C-均值聚类新算法[J]. 电子学报,2008,36(10):1996-2000. [14]　栾亚建,黄翀民,龚高晟,等． Hadoop 平台的性能优化研究[J]. 计算机工程,2010,36(4):262-264. [15]　Dittrich J,Quiane R J A,Jindal A,et al. Hadoop + + : Making a Yellow Elephant Run Like a Cheetah(Without It Even Noticing)[J]. VLDB Endowment,2010,3(1/ 2):518-529. [16]　Bu Y,Howe B,Balazinska M,et al. HaLoop:Efficient It Erative Data Processing on Large Clusters[C] / / Proc. of the 36th International Conference on Very Large Data Bases． Singapore:[s． n． ],2010:285-296. 编辑　陆燕菲

[1]	褚跃跃, 闫飞, 李浦. 含扰动迭代学习补偿的城市交通信号预测控制方法[J]. 计算机工程, 2023, 49(7): 305-312.
[2]	付嘉豪, 杨嘉怡, 李爱国. 面向安防系统的高效用语义轨迹模式挖掘[J]. 计算机工程, 2023, 49(6): 62-70.
[3]	王海浪, 张玲华. 基于PEGASIS的无线传感器网络路由协议改进[J]. 计算机工程, 2022, 48(12): 165-171,179.
[4]	徐奔业, 顾斌杰, 潘丰, 熊伟丽. 加权光滑投影孪生支持向量回归算法[J]. 计算机工程, 2022, 48(12): 104-111,118.
[5]	陈仲晗, 赵俊莉, 黄瑞坤. 基于径向曲线与支持向量回归的颅骨修复方法[J]. 计算机工程, 2022, 48(1): 305-311.
[6]	方海涛, 李明齐, 卞鑫. 基于DFT寻径的压缩感知信道估计改进算法[J]. 计算机工程, 2022, 48(1): 182-187.
[7]	秦轩, 冯磊, 梁庆华, 张伟. 基于MSER-Otsu与直线矫正的仪表指针定位算法[J]. 计算机工程, 2021, 47(7): 289-295,300.
[8]	苏超群, 朱正为, 郭玉英. 基于高效卷积算子的异常抑制目标跟踪算法[J]. 计算机工程, 2021, 47(7): 266-272,288.
[9]	于丹宁, 倪坤, 刘云龙. 基于循环卷积神经网络的POMDP值迭代算法[J]. 计算机工程, 2021, 47(2): 90-94,102.
[10]	薛子晗, 潘迪, 何丽. 结合改进密度峰值聚类的LGC半监督学习方法优化[J]. 计算机工程, 2021, 47(2): 77-83,89.
[11]	季繁繁, 杨鑫, 袁晓彤. 基于深度神经网络二阶信息的结构化剪枝算法[J]. 计算机工程, 2021, 47(2): 12-18.
[12]	闫帅明, 卜旭辉, 朱盼盼, 梁嘉琪. 一种数据丢包情况下的交叉口排队长度均衡控制方法[J]. 计算机工程, 2021, 47(1): 21-29.
[13]	夏元天, 周菊香, 徐天伟. 含时变延迟且控制方向未知的自适应迭代学习[J]. 计算机工程, 2020, 46(7): 312-320.
[14]	熊亚辉, 陈东方, 王晓峰. 基于多尺度反向投影的图像超分辨率重建算法[J]. 计算机工程, 2020, 46(7): 251-259.
[15]	杨路, 黄俊汐, 李渊. FTN系统中一种改进的MMSE-NP-RISIC均衡算法[J]. 计算机工程, 2020, 46(7): 216-221.

选择文件类型/文献管理软件名称

选择包含的内容

MapReduce 模型下的模糊C 均值算法研究

Research on Fuzzy C-means Algorithm on MapReduce Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

MapReduce 模型下的模糊C 均值算法研究

Research on Fuzzy C-means Algorithm on MapReduce Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价