作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

面向海量新闻数据的HDFS节能存储策略

钟将,杨雷   

  1. (重庆大学计算机学院,重庆 400044)
  • 收稿日期:2014-11-18 出版日期:2015-12-15 发布日期:2015-12-15
  • 作者简介:钟将(1974-),男,教授、博士,主研方向:数据挖掘,电子商务,云计算;杨雷,硕士研究生。
  • 基金资助:
    国家“973”计划基金资助项目“高效可信的虚拟计算环境基础研究”(2011CB302600);中央高校基本科研业务费专项基金资助项目(CDJZR185502)。

Energy Conservation Storage Strategy of HDFS for Massive News Data

ZHONG Jiang,YANG Lei   

  1. (College of Computer Science,Chongqing University,Chongqing 400044,China)
  • Received:2014-11-18 Online:2015-12-15 Published:2015-12-15

摘要: 基于新闻数据的访问规律,提出一种改进的Hadoop分布式文件系统(HDFS),利用数据节点分区、文件迁移和节点待机等策略,使部分无任务的节点处于待机状态,实现系统高效节能存储。改进传统HDFS的写文件机制,将数据块优先写入剩余空间最大且处于活动状态的节点中, 使同一时段内创建的文件尽量分散至不同节点,增加节点待机概率,同时解决集群数据分布不均的问题。实验结果表明,应用节能存储策略的HDFS相比传统HDFS可降耗20%以上,且99.9%的文件读取响应时间均不受影响,具有较好的数据存储与访问性能。

关键词: 文件存储, 节能, 节点分区, 文件迁移, 节点匹配

Abstract: Based on the access rules of news data, this paper proposes an improved Hadoop Distributed File System(HDFS). In order to realize system efficiency energy conservation storage, the nodes without tasks are transitioned to standby mode through the data node partition strategy, file migration strategy and node standby strategy. Additionally, to increase the probability of node standby and balance data distribution in the cluster,it improves the traditional HDFS write file strategy.The data block is written in the active node with largest available space,and the files created in the same time period are distributed to different nodes. Experimental result shows the HDFS based on energy conservation storage strategy saves more than 20% energy than traditional HDFS,99.9% respones time of reading files are not affected,and it has good data storage and access performance.

Key words: file storage, energy conservation, node partition, file migration, node matching

中图分类号: