作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

所属专题: 大数据专题

• 大数据专题 • 上一篇    下一篇

基于图覆盖的大数据全比较数据分配算法

高燕军1,张雪英1,李凤莲1,田玉楚1,2   

  1. (1.太原理工大学 信息工程学院,山西 晋中 030600;2.昆士兰科技大学 电机工程及计算机科学学院,澳大利亚 布里斯班4001)
  • 收稿日期:2017-03-10 出版日期:2018-04-15 发布日期:2018-04-15
  • 作者简介:高燕军(1990—),男,硕士,主研方向为大数据、分布式计算;张雪英(通信作者),教授、博士、博士生导师;李凤莲,教授、博士;田玉楚,教授、博士、博士生导师。
  • 基金资助:
    山西省研究生联合培养基地人才培养项目(2017JD16);山西省优秀人才科技创新项目(201605D211021);2016年太原理工大学教改项目(24)。

Data Allocation Algorithm for Large Data with All-to-all Comparison Based on Graph Covering

GAO Yanjun 1,ZHANG Xueying 1,LI Fenglian 1,TIAN Yuchu 1,2   

  1. (1.College of Information Engineering,Taiyuan University of Technology,Jinzhong,Shanxi 030600,China; 2.School of Electrical Engineering and Computer Science,Queensland University of Technology,Brisbane 4001,Australia)
  • Received:2017-03-10 Online:2018-04-15 Published:2018-04-15

摘要: 在对大数据全比较问题进行分布式处理的过程中,现有的数据分配策略较少考虑比较任务和数据之间的特殊依赖关系,导致存储效率下降、任务分配不均衡。为此,提出基于图覆盖的数据分配算法。通过理论分析将大数据全比较的数据分配问题归纳为图覆盖问题,在此基础上构造图覆盖的最优解,根据特解分配数据。实验结果表明,与基于Hadoop的数据分配策略相比,该算法可确保比较任务具有100%的数据本地性,使节点之间达到负载均衡,并且提高存储节约率和整体计算性能。

关键词: 分布式计算, 大数据, 全比较, 数据分配, 图覆盖

Abstract: In the process of distributed processing of all-to-all comparison problem for large data,the existing data allocation strategies think less of the special dependency between the comparison task and the data,which lead to the low storage efficiency and imbalanced task allocation.Aiming at this problem,a Data Allocation Algorithm Based on Graph Covering(DAABGC) is proposed.Firstly,the problem of data allocation for large data is summarized as the problem of graph covering by theoretical analysis.Then,the optimal solution of several graph covering is constructed successfully and the data are allocated according to the special solution.Experimental results show that,compared with the Hadoop-based data allocation strategy,the proposed algorithm ensures that the comparison task has 100% data locality and load balancing between nodes.It also improve storage saving rate and overall computing performance.

Key words: distributed computing, big data, all-to-all comparison, data allocation, graph covering

中图分类号: