• 大数据专题 •

### 基于图覆盖的大数据全比较数据分配算法

1. (1.太原理工大学 信息工程学院,山西 晋中 030600;2.昆士兰科技大学 电机工程及计算机科学学院,澳大利亚 布里斯班4001)
• 收稿日期:2017-03-10 出版日期:2018-04-15 发布日期:2018-04-15
• 作者简介:高燕军(1990—),男,硕士,主研方向为大数据、分布式计算;张雪英(通信作者),教授、博士、博士生导师;李凤莲,教授、博士;田玉楚,教授、博士、博士生导师。
• 基金资助:
山西省研究生联合培养基地人才培养项目(2017JD16);山西省优秀人才科技创新项目(201605D211021);2016年太原理工大学教改项目(24)。

### Data Allocation Algorithm for Large Data with All-to-all Comparison Based on Graph Covering

GAO Yanjun 1,ZHANG Xueying 1,LI Fenglian 1,TIAN Yuchu 1,2

1. (1.College of Information Engineering,Taiyuan University of Technology,Jinzhong,Shanxi 030600,China; 2.School of Electrical Engineering and Computer Science,Queensland University of Technology,Brisbane 4001,Australia)
• Received:2017-03-10 Online:2018-04-15 Published:2018-04-15

Abstract: In the process of distributed processing of all-to-all comparison problem for large data,the existing data allocation strategies think less of the special dependency between the comparison task and the data,which lead to the low storage efficiency and imbalanced task allocation.Aiming at this problem,a Data Allocation Algorithm Based on Graph Covering(DAABGC) is proposed.Firstly,the problem of data allocation for large data is summarized as the problem of graph covering by theoretical analysis.Then,the optimal solution of several graph covering is constructed successfully and the data are allocated according to the special solution.Experimental results show that,compared with the Hadoop-based data allocation strategy,the proposed algorithm ensures that the comparison task has 100% data locality and load balancing between nodes.It also improve storage saving rate and overall computing performance.