作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

所属专题: 云计算专题

• 云计算专题 • 上一篇    下一篇

基于递归随机抽样的Hadoop配置优化

祝春祥1,陈世平1,陈敏刚2   

  1. (1.上海理工大学光电信息与计算机工程学院,上海 200093; 2.上海市计算机软件评测重点实验室,上海 201112)
  • 收稿日期:2015-03-18 出版日期:2016-02-15 发布日期:2016-01-29
  • 作者简介:祝春祥(1990-),男,硕士,主研方向为大数据处理、云计算;陈世平,教授、博士;陈敏刚,副研究员、博士。
  • 基金资助:
    国家自然科学基金资助项目(61472256,61170277);上海市教委科研创新基金资助重点项目(12zz137);上海市科委科技创新基金资助项目(13511505303);上海市一流学科建设基金资助项目(S1201YLSK)。

Configuration Optimization of Hadoop Based on Recursive Random Sampling

ZHU Chunxiang 1,CHEN Shiping 1,CHEN Mingang 2   

  1. (1.School of Optical-electrical and Computer Engineering,University of Shanghai for Science and Technology, Shanghai 200093,China; 2.Shanghai Key Laboratory of Computer Software Testing & Evaluating,Shanghai 201112,China)
  • Received:2015-03-18 Online:2016-02-15 Published:2016-01-29

摘要: Hadoop平台目前有近200个配置参数,对这些参数进行合理配置能提高系统性能。针对Hadoop参数配置的优化问题,提出一种基于递归随机抽样的黑盒优化策略。利用随机抽样的初始高效性,通过不断调整样本空间进行递归随机抽样,从而快速搜索到近似的全局最优配置。实 验结果表明,与传统配置方法相比,应用黑盒优化策略的配置方法可提高14%~25%的Hadoop作业处理速度,且具有较好的稳定性和可靠性。

关键词: Hadoop平台, 黑盒优化, 粒子群算法, 模拟退火算法, 递归随机抽样

Abstract: Hadoop platform has over 200 configuration parameters,it can improve the performance of the system by deploying these parameters in reason.Aiming at the optimization of Hadoop configuration parameters problem,a black box optimization strategy based on recursive random sampling is proposed.It utilizes the initial high efficiency of random sampling,constantly adjusts the sample space to start recursive random sampling,and searches the approximate global optimal configuration quickly and efficiently.Experimental results show that,the configuration method based on black box optimization strategy can improve the job processing speed by 14%~25% than traditional method,and it has good stability and reliability.

Key words: Hadoop platform, black box optimization, article swarm algorithm, simulated annealing algorithm, recursive random sampling

中图分类号: