作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

基于支持向量机的HDFS副本放置改进策略

罗军,陈仕强   

  1. (重庆大学计算机学院,重庆 400044)
  • 收稿日期:2014-07-21 出版日期:2015-11-15 发布日期:2015-11-13
  • 作者简介:罗军(1961-),男,副教授、硕士,主研方向:数据库技术,系统建模及设计;陈仕强,硕士研究生。

Improved Strategy of HDFS Replica Placement Based on Support Vector Machine

LUO Jun,CHEN Shiqiang   

  1. (College of Computer Science,Chongqing University,Chongqing 400044,China)
  • Received:2014-07-21 Online:2015-11-15 Published:2015-11-13

摘要: 为实现超大规模数据的存储并提高容错性,Hadoop分布式文件系统(HDFS)采用一种机架感知的多副本放置策略。但在放置过程中没有综合考虑各节点服务器的差异性,导致集群出现负载失衡。由于放置时采用随机方式,造成节点之间的网络距离过长,使得传输数据会 消耗大量时间。针对以上问题,提出一种基于SVM的副本放置策略。通过综合考虑节点负载情况、节点硬件性能、节点网络距离为副本找到最佳的放置节点。实验结果表明,与HDFS原有的副本放置策略相比,该策略能更有效地实现负载均衡。

关键词: 支持向量机, 云存储, 副本放置策略, 分布式文件系统, 负载均衡, 机架感知

Abstract: A multiple copies of a rack awareness placement strategy is adopted in the Hadoop Distributed File System(HDFS) to cope with the storage of very large scale data and improve the fault tolerance.However,the HDFS does not consider the difference of each node server,which can result in load imbalance of clusters.Meanwhile,HDFS chooses remote rack node to place replica randomly,which may lead to a long distance between nodes,so that the transmission of data between nodes consumes a lot of time.To solve the aforementioned problems,this paper proposes an improved strategy of HDFS replica placement based on SVM.It can find the best node for replica by considering the disk utilization of each node,the node’s hardware performance and network distance of nodes.Experimental results show that the proposed strategy effectively improves the load balancing in HDFS compared with the existing method used in HDFS system.

Key words: Support Vector Machine(SVM), cloud storage, replica placement policy, Distributed File System(DFS), load balancing, rack awareness

中图分类号: