作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

海量小数据分布式聚类优化与负载均衡算法

汪明明,陈庆奎   

  1. (上海理工大学 a.光电信息与计算机工程学院; b.管理学院,上海 200093)
  • 收稿日期:2017-01-18 出版日期:2018-02-15 发布日期:2018-02-15
  • 作者简介:汪明明(1993—),女,硕士研究生,主研方向为分布式计算、物联网技术;陈庆奎,教授、博士、博士生导师。
  • 基金资助:
    国家自然科学基金(61572325);上海市重点科技攻关项目(14511107902,16DZ1203603);上海市工程中心建设项目(GCZX 14014);上海智能家居大规模物联共性技术工程中心项目(GCZX14014)。

Distributed Clustering Optimization and Load Balancing Algorithm for Massive Small-size Data

WANG Mingming  a,CHEN Qingkui  a,b   

  1. (a.School of Optical-Electrical and Computer Engineering; b.School of Business,University of Shanghai for Science and Technology,Shanghai 200093,China)
  • Received:2017-01-18 Online:2018-02-15 Published:2018-02-15

摘要: SensorFS系统中的集中式传感器聚类算法会使主节点成为系统瓶颈,并且在传感器量大时速度较慢。为此,分别设计分布式传感器聚类算法和细粒度负载均衡算法对系统进行改进。令主节点只负责初始写调度,传感器再次发出写请求时则直接与对应的ChunkServer节点进行交互。在各ChunkServer节点内部利用传感依赖图进行传感器聚类,得到多个传感器类后由主节点聚类。在此基础上,根据各传感器产生数据的速度计算服务器负载,以传感器类为最小单位进行细粒度迁移。实验结果表明,分布式聚类算法和负载均衡算法能有效提升Hadoop分布式文件系统对海量传感小数据的读写性能。

关键词: 海量小数据, 分布式系统, 聚类, 负载均衡, 传感器

Abstract: The master node will easily become a bottleneck in SensorFS due to the centralized sensor clustering algorithm.Besides,it will cost a lot of time when there are massive data.In that way,distributed sensor clustering algorithm and load balancing algorithm based on sensor are put forward.The master node is only responsible for initial scheduling of sensors.Then,the sensor interacts with specific ChunkServer node directly.Inside each ChunkServer node,the sensors are divided into multiple classes by sensor clustering algorithm based on Sensing Dependency Graph(SDG),and these sensor classes can be clustered by master node.Furthermore,taking the different file arriving-rate of each sensor into account,the load balancing is executed based on sensor class.Experimental results show that,in case of massive small data in Hadoop Distributed File System(HDFS),distributed sensor clustering algorithm and load balancing algorithm based on sensor class can effectively improve the read/write performance of the system for massive small-size data.

Key words: massive small-size data, distributed system, clustering, load balancing, sensor

中图分类号: