作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

基于磁盘I/O性能的Hadoop任务选择策略

李强1,2,孙震宇1,2,雷晓凤1,2,孙功星1   

  1. (1.中国科学院高能物理研究所,北京 100049; 2.中国科学院大学,北京 100049)
  • 收稿日期:2015-10-10 出版日期:2016-11-15 发布日期:2016-11-15
  • 作者简介:李强(1988—),男,博士研究生,主研方向为分布式计算;孙震宇、雷晓凤,博士研究生;孙功星,研究员、博士生导师。
  • 基金资助:
    国家自然科学基金(11375223,11375221);国家自然科学基金委员会-中国科学院大科学装置联合基金(11179020)。

Hadoop Task Selection Strategy Based on Disk I/O Performance

LI Qiang  1,2,SUN Zhenyu  1,2,LEI Xiaofeng  1,2,SUN Gongxing 1   

  1. (1.Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China;2.University of Chinese Academy of Sciences,Beijing 100049,China)
  • Received:2015-10-10 Online:2016-11-15 Published:2016-11-15

摘要: 最大化利用本地磁盘的I/O资源是提升计算集群性能的关键,但Hadoop系统中多数调度算法未考虑此项因素。为此,引入磁盘负载作为Map任务选择的权衡参数,任务调度时参照磁盘负载程度选择合适的任务,以保证数据节点上各磁盘的负载相对均衡,并据此设计新的任务选择模块集成到Hadoop的调度器中。同时为进一步提升Hadoop系统的性能,实现Map作业的近似完全本地化执行。实验结果表明,该任务选择策略能够充分利用数据节点本地磁盘的I/O资源,可使节点的I/O Wait平均降低5%,CPU利用率平均上升15%,作业的执行时间缩短20%。

关键词: Hadoop系统, 调度算法, 数据本地性, 任务选择策略, 磁盘负载, I/O性能

Abstract: Maximum use of local disk I/O resources is the key to improve computing cluster performance,but most of the scheduling algorithms in Hadoop system do not consider this factor.Aiming at this problem,a new task selection strategy is proposed,which takes the disk workload as a parameter in the procedure of MAP task selection and refers to each disk workload to choose the appropriate task during task scheduling,so as to achieve balanced disk workload on data nodes.Besides,a new task selection module is designed and integrated into the task scheduler of Hadoop.In order to further improve Hadoop system’s performance,an appropriate fully localized job execution mechanism is implemented.Experimental results prove that the proposed strategy makes full use of disk I/O resources,reduces I/O Wait by 5% on average,increases CPU utilization rate by 15% on average,and reduces the job execution time by 20%.

Key words: Hadoop system, scheduling algorithm, data locality, task selection strategy, disk workload, I/O performance

中图分类号: