Abstract:
The master node will easily become a bottleneck in SensorFS due to the centralized sensor clustering algorithm.Besides,it will cost a lot of time when there are massive data.In that way,distributed sensor clustering algorithm and load balancing algorithm based on sensor are put forward.The master node is only responsible for initial scheduling of sensors.Then,the sensor interacts with specific ChunkServer node directly.Inside each ChunkServer node,the sensors are divided into multiple classes by sensor clustering algorithm based on Sensing Dependency Graph(SDG),and these sensor classes can be clustered by master node.Furthermore,taking the different file arriving-rate of each sensor into account,the load balancing is executed based on sensor class.Experimental results show that,in case of massive small data in Hadoop Distributed File System(HDFS),distributed sensor clustering algorithm and load balancing algorithm based on sensor class can effectively improve the read/write performance of the system for massive small-size data.
Key words:
massive small-size data,
distributed system,
clustering,
load balancing,
sensor
摘要: SensorFS系统中的集中式传感器聚类算法会使主节点成为系统瓶颈,并且在传感器量大时速度较慢。为此,分别设计分布式传感器聚类算法和细粒度负载均衡算法对系统进行改进。令主节点只负责初始写调度,传感器再次发出写请求时则直接与对应的ChunkServer节点进行交互。在各ChunkServer节点内部利用传感依赖图进行传感器聚类,得到多个传感器类后由主节点聚类。在此基础上,根据各传感器产生数据的速度计算服务器负载,以传感器类为最小单位进行细粒度迁移。实验结果表明,分布式聚类算法和负载均衡算法能有效提升Hadoop分布式文件系统对海量传感小数据的读写性能。
关键词:
海量小数据,
分布式系统,
聚类,
负载均衡,
传感器
CLC Number:
WANG Mingming,CHEN Qingkui. Distributed Clustering Optimization and Load Balancing Algorithm for Massive Small-size Data[J]. Computer Engineering.
汪明明,陈庆奎. 海量小数据分布式聚类优化与负载均衡算法[J]. 计算机工程.