作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

异构环境中HDFS数据块调度算法的设计与实现

高原 1,2,3,任升 1,2,3,顾文杰 1,2,3   

  1. (1.南瑞集团公司(国网电力科学研究院),南京 211106; 2.国电南瑞科技股份有限公司,南京 211106; 3.智能电网保护和运行控制国家重点实验室,南京 211106)
  • 收稿日期:2016-08-12 出版日期:2017-08-15 发布日期:2017-08-15
  • 作者简介:高原(1981—),男,高级工程师、硕士,主研方向为任务调度、网络通信、云计算;任升,工程师;顾文杰,高级工程师、硕士。
  • 基金资助:
    国家电网公司科技项目“超大规模电网调控系统集群化关键技术研究”。

Design and Implementation of HDFS Data Block Scheduling Algorithm in Heterogeneous Environment

GAO Yuan 1,2,3,REN Sheng 1,2,3,GU Wenjie 1,2,3   

  1. (1.NARI Group Corporation(State Grid Electric Power Research Institute),Nanjing 211106,China;2.NARI Technology Co.,Ltd.,Nanjing 211106,China;3.State Key Laboratory of Smart Grid Protection and Control,Nanjing 211106,China)
  • Received:2016-08-12 Online:2017-08-15 Published:2017-08-15

摘要: 针对Hadoop分布式文件系统(HDFS)的写性能在执行效率上的不足,提出一种在节点性能异构环境中对HDFS数据块进行并发传输的调度算法。该算法实时监控HDFS集群中每个节点的资源状态和内存缓存队列,动态地将接收节点与转发节点进行配对传输,使全系统节点的网卡和磁盘并发工作,缩短了所有副本写入分布式文件系统的时间。将数据写入磁盘后请求下一个数据块,保证数据安全性,同时也使得各个节点获得与自身性能相匹配的副本数,使性能异构的系统能达到较高的写入速度。性能测试结果表明,使用该算法的分布式文件系统的写入性能较原始的HDFS提高了1倍。

关键词: 异构, Hadoop分布式文件系统, 并发, 数据块, 调度

Abstract: Aiming at the insufficient of write performance of Hadoop Distributed File System(HDFS),a scheduling algorithm for concurrent transmission of HDFS data blocks in a heterogeneous environment is proposed.The algorithm monitors the resource status and memory queue of each node in the HDFS cluster in real time,matches receiving nodes with the forwarding nodes dynamically,makes the network cards and disk of the whole system work concurrently and reduces the time to write all copies to the distributed file system.The algorithm ensures that the data are written to disk before requesting the next data block for the data security.In the meantime,it makes the number of copies of each node match its own performance,so that the heterogeneous systems can achieve a high rate of writing.Performance tests show that the write performance of the distributed file system using the proposed algorithm is improved by 1 times compared with the original HDFS.

Key words: heterogeneous, Hadoop Distributed File System(HDFS), concurrent, data block, scheduling

中图分类号: