A Heterogeneous Graph Computing System for Large-Scale Dynamic Graph

doi:10.19678/j.issn.1000-3428.0068477

Abstract

Abstract: GPU is not fully utilized when processing large-scale dynamic graph and the limitations of GPU-oriented graph partitioning methods lead to performance bottlenecks. To improve the performance of graph computing, a CPU/GPU heterogeneous graph computing engine is proposed to improve the performance of heterogeneous processors. Firstly, a new heterogeneous graph partitioning algorithm is proposed. It uses streaming algorithm for graph partitioning as the core to achieve dynamic load balancing between computing nodes and between CPU and GPU. The greedy strategy assigns vertices based on the maximum neighboring vertices during initial graph partitioning and dynamically adjusts vertex position based on the minimum connected edges during iteration. Secondly, the system introduces a GPU heterogeneous computing model to improve graph computing efficiency by functional parallelism. The experiment takes PageRank, Connected Component, SSSP, and K-core as examples to conduct comparative experiments with other graph computing systems. Compared to other graph engines, heterogeneous graph engine can better balance the computing load of each node and the load between heterogeneous processors to shorten delay and accelerate the overall computing speed. The results show that the CPU/GPU synergy of this system tends to 1 and the graph computing speedup ratio reaches 5 times compared with others. The Distributed Heterogeneous Engine (DH-Engine) can provide better graph heterogeneous scheme.

摘要： 在GPU异构集群中处理大规模动态图时GPU计算资源未被充分利用,并且面向GPU的图划分方法存在局限性导致出现性能瓶颈。为提高图计算系统性能提出一种CPU/GPU异构图计算引擎，用于提升异构处理器的计算性能。首先提出新的异构图分割算法，该分割算法以流式图划分为核心，通过贪心策略调整顶点位置，进而实现计算节点之间、CPU/GPU之间的动态负载均衡。在初始图划分时该方法基于最多邻居顶点来分配图顶点，在迭代时基于最少连接边动态调整顶点位置。其次，设计面向的GPU异构计算模型，通过CPU/GPU功能并行的方式实现协同计算。CPU与GPU并行执行图算法，提高CPU核心的利用率，进而提升图计算效率。实验以图算法PageRank、Connected Component、SSSP与K-core为例，与其他图计算系统展开对比。与未考虑异构计算的图引擎相比，异构图引擎DH-Engine（Distributed Heterogeneous Engine）能更好地平衡各节点计算负载以及计算节点内部的异构处理器之间的负载，通过缩短局部时延来提高整体的计算速度。实验结果表明DH-Engine的CPU/GPU协同度趋于1。相较于其他图系统，DH-Engine异构计算的加速比达5倍。分布式异构引擎可以提供更好的图异构计算方案。

Zhang Ming, Guo Wenkang, Wang Haifeng†. A Heterogeneous Graph Computing System for Large-Scale Dynamic Graph[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0068477.

张明, 郭文康, 王海峰†. 面向大规模动态图的异构图计算系统设计[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0068477.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068477

References

[1] AYALL T A, LIU H, ZHOU C, et al. Graph Computing Systems and Partitioning Techniques: A Survey[J]. IEEE Access, 2022, 10: 118523-118550.
[2] LIU N, LI D sheng, ZHANG Y ming, et al. Large-scale graph processing systems: a survey[J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21(3): 384-404.
[3] MAZAHERI SOUDANI N, FATEMI A, NEMATBAKHSH M. An investigation of big graph partitioning methods for distribution of graphs in vertex-centric systems[J]. DISTRIBUTED AND PARALLEL DATABASES, 2020, 38(1): 1-29.
[4] GHOSH S, DAS N, GONCALVES T, et al. The journey of graph kernels through two decades[J]. COMPUTER SCIENCE REVIEW, 2018, 27: 88-111.
[5] Gui C Y, Zheng L, He B, et al. A survey on graph processing accelerators: Challenges and opportunities[J]. Journal of Computer Science and Technology, 2019, 34: 339-371.
[6] Gonzalez J E, Low Y, Gu H, et al. Powergraph: Distributed graph-parallel computation on natural graphs[C]//In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. USA: USENIX Association, 2012: 17-30.
[7] Aridhi S, Montresor A, Velegrakis Y. BLADYG: A graph processing framework for large dynamic graphs[J]. Big Data Research, 2017, 9: 9-17.
[8] Fan W, He T, Lai L, et al. GraphScope: a unified engine for big graph processing[J]. Proceedings of the VLDB Endowment, 2021, 14(12): 2879-2892.
[9] Shi X, Zheng Z, Zhou Y, et al. Graph processing on GPUs: A survey[J]. ACM Computing Surveys (CSUR), 2018, 50(6): 1-35.
[10] Zhang T, Zhang J, Shu W, et al. Efficient graph computation on hybrid CPU and GPU systems[J]. The Journal of Supercomputing, 2015, 71: 1563-1586.
[11] Jia Z, Kwon Y, Shipman G, et al. A distributed multi-gpu system for fast graph processing[J]. Proceedings of the VLDB Endowment, 2017, 11(3): 297-310.
[12] Yangzihao Wang, Yuechao Pan, Andrew Davidson, et al. Gunrock: GPU Graph Analytics[J]. ACM Trans. 2017,4: 2329-4949.
[13] Zhou S, Kannan R, Prasanna V K, et al. Hitgraph: High-throughput graph processing framework on fpga[J]. IEEE Trans on Parallel and Distributed Systems, 2019, 30(10): 2249-2264.
[14] Miao X, Ma L, Yang Z, et al. Cuwide: Towards efficient flow-based training for sparse wide models on gpus[J]. IEEE Trans on Knowledge and Data Engineering, 2020, 34(9): 4119-4132.
[15] Zhu H, He L, Leeke M, et al. WolfGraph: The edge-centric graph processing on GPU[J]. Future Generation Computer Systems, 2020, 111: 552-569.
[16] Wang P, Wang J, Li C, et al. Grus: Toward unified-memory-efficient high-performance graph processing on gpu[J]. ACM Transactions on Architecture and Code Optimization (TACO), 2021, 18(2): 1-25.
[17] Zhang Y, Peng D, Liao X, et al. LargeGraph: An efficient dependency-aware GPU-accelerated large-scale graph processing[J]. ACM Transactions on Architecture and Code Optimization (TACO), 2021, 18(4): 1-24.
[18] Yang C, Buluç A, Owens J D. GraphBLAST: A high-performance linear algebra-based graphframework on the GPU[J]. ACM Transactions on Mathematical Software (TOMS), 2022, 48(1): 1-51.
[19] 蒋筱斌, 熊轶翔, 张珩, 等. ChattyGraph: 面向异构多协处理器环境的高可扩展图计算系统[J]. 软件学报, 2023, 34(4):1977-1996 (Jiang Xiaobin, Xiong Yixiang, Zhang Heng, et al. ChattyGraph: Highly Scalable Graph Computing System for Heterogeneous Multi Accelerators[J]. Journal of Software. 2023, 34(4):1977-1996)
[20] Wenfei Fan, Jingbo Xu, Yinghui Wu, et al. 2017. Parallelizing Sequential Graph Computations[C]. //In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 495–510.
[21] 钱裳云, 邵志远, 郑然, 陈继林. 图数据库中基于 GPU 的图分析计算方法[J]. 计算机工程, 2021, 47(6): 52-59. (QIAN Shangyun, SHAO Zhiyuan, ZHENG Ran, CHEN Jilin. GPU-based Graph Analysis and Computation Method for Graph Database[J]. Computer Engineering, 2021, 47(6): 52-59.)
[22] 王晓峰, 于卓, 赵健, 曹泽轩. 大规模图例的最大团问题算法分析[J]. 计算机工程, 2022, 48(6): 182-192,199. (WANG Xiaofeng, YU Zhuo, ZHAO Jian, CAO Zexuan. Algorithm Analysis for Solving Maximum Clique Problems of Large-scale Graphs[J]. Computer Engineering, 2022, 48(6): 182-192,199.)
[23] HUANG J, WANG H, FEI X, et al. TCStream: Large-Scale Graph Triangle-Counting on a single Machine using GPUs[J]. IEEE Trans on Parallel and Distributed Systems, 2022, 33:3067-3078
[24] Page L , Brin S , Motwani R ,et al. The PageRank Citation Ranking: Bringing Order to the Web[J]. Stanford Digital Libraries Working Paper, 1998.
[25] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection [EB/OL]. [2014-6]. http://snap.stanford.edu/data
[26] Feng X, Chang L, Lin X, et al. Distributed computing connected components with linear communication cost[J]. Distributed and Parallel Databases, 2018, 36: 555-592.
[27] Maleki S, Nguyen D, Lenharth A, et al. DSMR: a shared and distributed memory algorithm for single-source shortest path problem[C] //the 21st ACM SIGPLAN Symposium.ACM, 2016, 39:1-2.
[28] Batagelj V, Zaversnik M. An O(m) Algorithm for Cores Decomposition of Networks[J]. Arxiv, 2002.
[29] J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters[J]. Internet Mathematics, 2009,6(1): 29-123.

Please choose a citation manager

Content to export