摘要: 针对Hadoop平台的节点资源优化问题,提出MapReduce参数优化策略。获取新作业执行时的资源占用特征值,计算其与作业特征库中作业的相对距离,选择相对距离最小作业的配置作为新作业的最优配置,如果获取失败,则以迭代方式获取新作业的最优配置并更新作业特征库。实验结果表明,与默认参数配置相比,该策略能够提高作业执行效率,缩短作业运行时间。
关键词:
Hadoop集群,
MapReduce框架,
参数优化,
资源利用率,
执行效率,
特征库,
相对距离
Hadoop集群,
MapReduce框架,
参数优化,
资源利用率,
执行效率,
特征库,
相对距离
Hadoop集群,
MapReduce框架,
参数优化,
资源利用率,
执行效率,
特征库,
相对距离
Abstract: For solving the problem of node resources optimization in Hadoop platform,this paper proposes a MapReduce parameter optimization strategy.When a new job is submitted,it first gets feature value of resource utilization,and then calculates the relative distance with the jobs in the signature database.At last,it selects the configuration of the job with the minimum relative distance as the optimal configuration.If the configuration is not found,it gets optimal configuration by the way of iteration and then updates the feature database.Experimental results show that the proposed strategy can effectively improve the efficiency of job execution and reduce the execution time compared with the default parameter configuration.
Key words:
Hadoop cluster,
MapReduce framework,
parameter optimization,
resource utilization rate,
execution efficiency,
feature database,
relative distance
中图分类号: